Re: [VOTE] Hadoop 1.1.2-rc5 release candidate vote

2013-02-07 Thread Matt Foley
Wow, total apathy!  We only got one vote besides mine, and that was
non-binding.
I'll try again.  Please vote on this release candidate for Hadoop 1.1.2-rc5.
Voting will close one week from now, at 10pm PST on Thursday 14 Feb.

Thanks,
--Matt


On Fri, Feb 1, 2013 at 11:28 AM, Chris Nauroth wrote:

> +1 (non-binding)
>
> I deployed hadoop-1.1.2.tar.gz to 3 Ubuntu VMs and ran NN, JT, 2NN, 2 *
> DN, and 2 * TT.  I verified the checksum.  I tested multiple command line
> HDFS interactions and MapReduce jobs.  I specifically tested for the
> HDFS-4423 blocker bug fix, and it worked.  Since that change touched
> checkpointing, I also verified that the 2NN could complete a successful
> checkpoint.
>
> I'll also verify the PGP signature once I track down the public key that
> was used for signing.
>
> Thank you,
> --Chris
>
>
> On Thu, Jan 31, 2013 at 7:13 PM, Matt Foley  wrote:
>
>> (resending with modified Subject line for RC5)
>>
>> Hadoop-1.1.2-rc4 is withdrawn.
>>
>> Hadoop-1.1.2-rc5 is available at
>> http://people.apache.org/~mattf/hadoop-1.1.2-rc5/
>> or in SVN at
>> http://svn.apache.org/viewvc/hadoop/common/tags/release-1.1.2-rc5/
>> or in the Maven repo.
>>
>> This candidate for a stabilization release of the Hadoop-1.1 branch has 24
>> patches and several cleanups compared to the Hadoop-1.1.1 release.
>>  Release
>> notes are available at
>> http://people.apache.org/~mattf/hadoop-1.1.2-rc5/releasenotes.html
>>
>> Please vote for this as the next release of Hadoop-1.  Voting will close
>> next Thursday, 7 Feb, at 3:00pm PST.
>>
>> Thanks,
>> --Matt
>>
>>


Re: Heads up - merge branch-trunk-win to trunk

2013-02-07 Thread Eli Collins
Thanks for the update Suresh.  Has any testing been done on the branch on
Linux aside from running the unit tests?

Thanks,
Eli


On Thu, Feb 7, 2013 at 5:42 PM, Suresh Srinivas wrote:

> The support for Hadoop on Windows was proposed in
> HADOOP-8079 almost
> a year ago. The goal was to make Hadoop natively integrated, full-featured,
> and performance and scalability tuned on Windows Server or Windows Azure.
> We are happy to announce that a lot of progress has been made in this
> regard.
>
> Initial work started in a feature branch, branch-1-win, based on branch-1.
> The details related to the work done in the branch can be seen in
> CHANGES.txt<
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup
> >.
> This work has been ported to a branch, branch-trunk-win, based on trunk.
> Merge patch for this is available on
> HADOOP-8562
> .
>
> Highlights of the work done so far:
> 1. Necessary changes in Hadoop to run natively on Windows. These changes
> handle differences in platforms related to path names, process/task
> management etc.
> 2. Addition of winutils tools for managing file permissions and ownership,
> user group mapping, hardlinks, symbolic links, chmod, disk utilization, and
> process/task management.
> 3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh,
> start and stop scripts.
> 4. Addition of block placement policy implemnation to support cloud
> enviroment, more specifically Azure.
>
> We are very close to wrapping up the work in branch-trunk-win and getting
> ready for a merge. Currently the merge patch is passing close to 100% of
> unit tests on Linux. Soon I will call for a vote to merge this branch into
> trunk.
>
> Next steps:
> 1. Call for vote to merge branch-trunk-win to trunk, when the work
> completes and precommit build is clean.
> 2. Start a discussion on adding Jenkins precommit builds on windows and how
> to integrate that with the existing commit process.
>
> Let me know if you have any questions.
>
> Regards,
> Suresh
>


RE: Heads up - merge branch-trunk-win to trunk

2013-02-07 Thread Mahadevan Venkatraman
It is super exciting to look at the prospect of these changes being merged to 
trunk. Having Windows as one of the supported Hadoop platforms is a fantastic 
opportunity both for the Hadoop project and Microsoft customers.

This work began around a year back when a few of us started with a basic port 
of Hadoop on Windows. Ever since, the Hadoop team in Microsoft have made 
significant progress in the following areas:
(PS: Some of these items are already included in Suresh's email, but including 
again for completeness)

- Command-line scripts for the Hadoop surface area
- Mapping the HDFS permissions model to Windows
- Abstracted and reconciled mismatches around differences in Path semantics in 
Java and Windows
- Native Task Controller for Windows 
- Implementation of a Block Placement Policy to support cloud environments, 
more specifically Azure.
- Implementation of Hadoop native libraries for Windows (compression codecs, 
native I/O) - Several reliability issues, including race-conditions, 
intermittent test failures, resource leaks.
- Several new unit test cases written for the above changes

In the process, we have closely engaged with the Apache open source community 
and have got great support and assistance from the community in terms of 
contributing fixes, code review comments and commits. 

In addition, the Hadoop team at Microsoft has also made good progress in other 
projects including Hive, Pig, Sqoop, Oozie, HCat and HBase. Many of these 
changes have already been committed to the respective trunks with help from 
various committers and contributors. It is great to see the commitment of the 
community to support multiple platforms, and we look forward to the day when a 
developer/customer is able to successfully deploy a complete solution stack 
based on Apache Hadoop releases.

Next Steps:

All of the above changes are part of the Windows Azure HDInsight and HDInsight 
Server products from Microsoft. We have successfully on-boarded several 
internal customers and have been running production workloads on Windows Azure 
HDInsight. Our vision is to create a big data platform based on Hadoop, and we 
are committed to helping make Hadoop a world-class solution that anyone can use 
to solve their biggest data challenges. 

As an immediate next step, we would like to have a discussion around how we can 
ensure that the quality of the mainline Hadoop branches on Windows is 
maintained. To this end, we would like to get to the state where we have 
pre-checkin validation gates and nightly test runs enabled on Windows. If you 
have any suggestions around this, please do send an email.  We are committed to 
helping sustain the long-term quality of Hadoop on both Linux and Windows.

We sincerely thank the community for their contribution and support so far. And 
hope to continue having a close engagement in the future.

-Microsoft HDInsight Team


-Original Message-
From: Suresh Srinivas [mailto:sur...@hortonworks.com] 
Sent: Thursday, February 7, 2013 5:42 PM
To: common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org; 
hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
Subject: Heads up - merge branch-trunk-win to trunk

The support for Hadoop on Windows was proposed in 
HADOOP-8079 almost a year 
ago. The goal was to make Hadoop natively integrated, full-featured, and 
performance and scalability tuned on Windows Server or Windows Azure.
We are happy to announce that a lot of progress has been made in this regard.

Initial work started in a feature branch, branch-1-win, based on branch-1.
The details related to the work done in the branch can be seen in 
CHANGES.txt.
This work has been ported to a branch, branch-trunk-win, based on trunk.
Merge patch for this is available on
HADOOP-8562
.

Highlights of the work done so far:
1. Necessary changes in Hadoop to run natively on Windows. These changes handle 
differences in platforms related to path names, process/task management etc.
2. Addition of winutils tools for managing file permissions and ownership, user 
group mapping, hardlinks, symbolic links, chmod, disk utilization, and 
process/task management.
3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh, 
start and stop scripts.
4. Addition of block placement policy implemnation to support cloud enviroment, 
more specifically Azure.

We are very close to wrapping up the work in branch-trunk-win and getting ready 
for a merge. Currently the merge patch is passing close to 100% of unit tests 
on Linux. Soon I will call for a vote to merge this branch into trunk.

Next steps:
1. Call for vote to merge branch-trunk-win to trunk, when the work completes 
and precommit build is clean.
2. Start a discussion on adding Jenkins precommit builds on windows and how 

Heads up - merge branch-trunk-win to trunk

2013-02-07 Thread Suresh Srinivas
The support for Hadoop on Windows was proposed in
HADOOP-8079 almost
a year ago. The goal was to make Hadoop natively integrated, full-featured,
and performance and scalability tuned on Windows Server or Windows Azure.
We are happy to announce that a lot of progress has been made in this
regard.

Initial work started in a feature branch, branch-1-win, based on branch-1.
The details related to the work done in the branch can be seen in
CHANGES.txt.
This work has been ported to a branch, branch-trunk-win, based on trunk.
Merge patch for this is available on
HADOOP-8562
.

Highlights of the work done so far:
1. Necessary changes in Hadoop to run natively on Windows. These changes
handle differences in platforms related to path names, process/task
management etc.
2. Addition of winutils tools for managing file permissions and ownership,
user group mapping, hardlinks, symbolic links, chmod, disk utilization, and
process/task management.
3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh,
start and stop scripts.
4. Addition of block placement policy implemnation to support cloud
enviroment, more specifically Azure.

We are very close to wrapping up the work in branch-trunk-win and getting
ready for a merge. Currently the merge patch is passing close to 100% of
unit tests on Linux. Soon I will call for a vote to merge this branch into
trunk.

Next steps:
1. Call for vote to merge branch-trunk-win to trunk, when the work
completes and precommit build is clean.
2. Start a discussion on adding Jenkins precommit builds on windows and how
to integrate that with the existing commit process.

Let me know if you have any questions.

Regards,
Suresh


Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Hitesh Shah
+1 (non-binding) 

Downloaded ( verified checksums ) and built from source, deployed and 
successfully ran both MR and distributed shell examples. 

-- Hitesh

On Feb 6, 2013, at 7:59 PM, Arun C Murthy wrote:

> Folks,
> 
> I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I would 
> like to release.
> 
> This release contains several major enhancements such as QJM for HDFS HA, 
> multi-resource scheduling for YARN, YARN ResourceManager restart etc.  
> Also YARN has achieved significant stability at scale (more details from Y! 
> folks here: http://s.apache.org/VYO).
> 
> The RC is available at: 
> http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
> The RC tag in svn is here: 
> http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
> 
> The maven artifacts are available via repository.apache.org.
> 
> Please try the release and vote; the vote will run for the usual 7 days.
> 
> thanks,
> Arun
> 
> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 



Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Chris Nauroth
+1 non-binding

I downloaded hadoop-2.0.3-alpha.tar.gz and verified the checksum and
signature.  I deployed to a set of Ubuntu VMs: NN, RM, 2*DN/NM, and 2NN.  I
tested a few HDFS operations and MapReduce jobs.  I verified that the 2NN
could complete a checkpoint.

Thank you,
--Chris


On Thu, Feb 7, 2013 at 1:37 PM, Thomas Graves  wrote:

> +1 (binding). I downloaded it, verified checksums, built from source,
> installed (both binary and one I built) and ran some basic jobs.
>
> Tom
>
>
>
> On 2/6/13 9:59 PM, "Arun C Murthy"  wrote:
>
> >Folks,
> >
> >I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I
> >would like to release.
> >
> >This release contains several major enhancements such as QJM for HDFS HA,
> >multi-resource scheduling for YARN, YARN ResourceManager restart etc.
> >Also YARN has achieved significant stability at scale (more details from
> >Y! folks here: http://s.apache.org/VYO).
> >
> >The RC is available at:
> >http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
> >The RC tag in svn is here:
> >http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
> >
> >The maven artifacts are available via repository.apache.org.
> >
> >Please try the release and vote; the vote will run for the usual 7 days.
> >
> >thanks,
> >Arun
> >
> >
> >
> >--
> >Arun C. Murthy
> >Hortonworks Inc.
> >http://hortonworks.com/
> >
> >
>
>


Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Thomas Graves
+1 (binding). I downloaded it, verified checksums, built from source,
installed (both binary and one I built) and ran some basic jobs.

Tom



On 2/6/13 9:59 PM, "Arun C Murthy"  wrote:

>Folks,
>
>I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I
>would like to release.
>
>This release contains several major enhancements such as QJM for HDFS HA,
>multi-resource scheduling for YARN, YARN ResourceManager restart etc.
>Also YARN has achieved significant stability at scale (more details from
>Y! folks here: http://s.apache.org/VYO).
>
>The RC is available at:
>http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
>The RC tag in svn is here:
>http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
>
>The maven artifacts are available via repository.apache.org.
>
>Please try the release and vote; the vote will run for the usual 7 days.
>
>thanks,
>Arun
>
>
>
>--
>Arun C. Murthy
>Hortonworks Inc.
>http://hortonworks.com/
>
>



Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Jitendra Pandey
Downloaded, verified checksums, installed and did basic verification.

+1 (binding)


On Thu, Feb 7, 2013 at 11:54 AM, Karthik Kambatla wrote:

> +1 (non-binding)
>
> Downloaded src tar ball, built binary tar ball, ran a couple of sample MR
> jobs on a pseudo-dist cluster.
>
> On Thu, Feb 7, 2013 at 10:24 AM, Andrew Wang  >wrote:
>
> > Verified the tarball checksums. Ran a couple example jobs on a 3 node
> > cluster successfully, with the same WARN caveat as Bobby.
> >
> > +1 (non-binding).
> >
> > On Thu, Feb 7, 2013 at 7:33 AM, Robert Evans 
> wrote:
> > > I downloaded the binary package and ran a few example jobs on a 3 node
> > > cluster.  Everything seems to be working OK on it, I did see
> > >
> > > WARN util.NativeCodeLoader: Unable to load native-hadoop library for
> your
> > > platform... using builtin-java classes where applicable
> > >
> > > For every shell command, but just like with 0.23.6 I don't think it is
> a
> > > blocker.
> > >
> > > +1 (Binding)
> > >
> > > --Bobby
> > >
> > > On 2/6/13 9:59 PM, "Arun C Murthy"  wrote:
> > >
> > >>Folks,
> > >>
> > >>I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I
> > >>would like to release.
> > >>
> > >>This release contains several major enhancements such as QJM for HDFS
> HA,
> > >>multi-resource scheduling for YARN, YARN ResourceManager restart etc.
> > >>Also YARN has achieved significant stability at scale (more details
> from
> > >>Y! folks here: http://s.apache.org/VYO).
> > >>
> > >>The RC is available at:
> > >>http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
> > >>The RC tag in svn is here:
> > >>
> http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
> > >>
> > >>The maven artifacts are available via repository.apache.org.
> > >>
> > >>Please try the release and vote; the vote will run for the usual 7
> days.
> > >>
> > >>thanks,
> > >>Arun
> > >>
> > >>
> > >>
> > >>--
> > >>Arun C. Murthy
> > >>Hortonworks Inc.
> > >>http://hortonworks.com/
> > >>
> > >>
> > >
> >
>



-- 



Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Karthik Kambatla
+1 (non-binding)

Downloaded src tar ball, built binary tar ball, ran a couple of sample MR
jobs on a pseudo-dist cluster.

On Thu, Feb 7, 2013 at 10:24 AM, Andrew Wang wrote:

> Verified the tarball checksums. Ran a couple example jobs on a 3 node
> cluster successfully, with the same WARN caveat as Bobby.
>
> +1 (non-binding).
>
> On Thu, Feb 7, 2013 at 7:33 AM, Robert Evans  wrote:
> > I downloaded the binary package and ran a few example jobs on a 3 node
> > cluster.  Everything seems to be working OK on it, I did see
> >
> > WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
> > platform... using builtin-java classes where applicable
> >
> > For every shell command, but just like with 0.23.6 I don't think it is a
> > blocker.
> >
> > +1 (Binding)
> >
> > --Bobby
> >
> > On 2/6/13 9:59 PM, "Arun C Murthy"  wrote:
> >
> >>Folks,
> >>
> >>I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I
> >>would like to release.
> >>
> >>This release contains several major enhancements such as QJM for HDFS HA,
> >>multi-resource scheduling for YARN, YARN ResourceManager restart etc.
> >>Also YARN has achieved significant stability at scale (more details from
> >>Y! folks here: http://s.apache.org/VYO).
> >>
> >>The RC is available at:
> >>http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
> >>The RC tag in svn is here:
> >>http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
> >>
> >>The maven artifacts are available via repository.apache.org.
> >>
> >>Please try the release and vote; the vote will run for the usual 7 days.
> >>
> >>thanks,
> >>Arun
> >>
> >>
> >>
> >>--
> >>Arun C. Murthy
> >>Hortonworks Inc.
> >>http://hortonworks.com/
> >>
> >>
> >
>


Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Andrew Wang
Verified the tarball checksums. Ran a couple example jobs on a 3 node
cluster successfully, with the same WARN caveat as Bobby.

+1 (non-binding).

On Thu, Feb 7, 2013 at 7:33 AM, Robert Evans  wrote:
> I downloaded the binary package and ran a few example jobs on a 3 node
> cluster.  Everything seems to be working OK on it, I did see
>
> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable
>
> For every shell command, but just like with 0.23.6 I don't think it is a
> blocker.
>
> +1 (Binding)
>
> --Bobby
>
> On 2/6/13 9:59 PM, "Arun C Murthy"  wrote:
>
>>Folks,
>>
>>I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I
>>would like to release.
>>
>>This release contains several major enhancements such as QJM for HDFS HA,
>>multi-resource scheduling for YARN, YARN ResourceManager restart etc.
>>Also YARN has achieved significant stability at scale (more details from
>>Y! folks here: http://s.apache.org/VYO).
>>
>>The RC is available at:
>>http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
>>The RC tag in svn is here:
>>http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
>>
>>The maven artifacts are available via repository.apache.org.
>>
>>Please try the release and vote; the vote will run for the usual 7 days.
>>
>>thanks,
>>Arun
>>
>>
>>
>>--
>>Arun C. Murthy
>>Hortonworks Inc.
>>http://hortonworks.com/
>>
>>
>


Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Alejandro Abdelnur
+1. Downloaded SRC, verified MD5 and signature, did a full build,
configured, started up HDFS, YARN, HTTPS, run a a couple of example MR
jobs, tested HTTPS access to HDFS.

Thanks for driving this release Arun.


On Thu, Feb 7, 2013 at 7:33 AM, Robert Evans  wrote:

> I downloaded the binary package and ran a few example jobs on a 3 node
> cluster.  Everything seems to be working OK on it, I did see
>
> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable
>
> For every shell command, but just like with 0.23.6 I don't think it is a
> blocker.
>
> +1 (Binding)
>
> --Bobby
>
> On 2/6/13 9:59 PM, "Arun C Murthy"  wrote:
>
> >Folks,
> >
> >I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I
> >would like to release.
> >
> >This release contains several major enhancements such as QJM for HDFS HA,
> >multi-resource scheduling for YARN, YARN ResourceManager restart etc.
> >Also YARN has achieved significant stability at scale (more details from
> >Y! folks here: http://s.apache.org/VYO).
> >
> >The RC is available at:
> >http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
> >The RC tag in svn is here:
> >http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
> >
> >The maven artifacts are available via repository.apache.org.
> >
> >Please try the release and vote; the vote will run for the usual 7 days.
> >
> >thanks,
> >Arun
> >
> >
> >
> >--
> >Arun C. Murthy
> >Hortonworks Inc.
> >http://hortonworks.com/
> >
> >
>
>


-- 
Alejandro


Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Robert Evans
I downloaded the binary package and ran a few example jobs on a 3 node
cluster.  Everything seems to be working OK on it, I did see

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable

For every shell command, but just like with 0.23.6 I don't think it is a
blocker.

+1 (Binding)

--Bobby

On 2/6/13 9:59 PM, "Arun C Murthy"  wrote:

>Folks,
>
>I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I
>would like to release.
>
>This release contains several major enhancements such as QJM for HDFS HA,
>multi-resource scheduling for YARN, YARN ResourceManager restart etc.
>Also YARN has achieved significant stability at scale (more details from
>Y! folks here: http://s.apache.org/VYO).
>
>The RC is available at:
>http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
>The RC tag in svn is here:
>http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
>
>The maven artifacts are available via repository.apache.org.
>
>Please try the release and vote; the vote will run for the usual 7 days.
>
>thanks,
>Arun
>
>
>
>--
>Arun C. Murthy
>Hortonworks Inc.
>http://hortonworks.com/
>
>



[jira] [Created] (HADOOP-9291) enhance unit-test coverage of package o.a.h.metrics2

2013-02-07 Thread Ivan A. Veselovsky (JIRA)
Ivan A. Veselovsky created HADOOP-9291:
--

 Summary: enhance unit-test coverage of package o.a.h.metrics2
 Key: HADOOP-9291
 URL: https://issues.apache.org/jira/browse/HADOOP-9291
 Project: Hadoop Common
  Issue Type: Test
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: More information regarding the Project suggestions given on the Hadoop website

2013-02-07 Thread Robert Evans
This conversation is probably better for common-user@ so I am moving it
over there, I put common-dev@ in the BCC.

I am not really sure what you mean by validate.  I assume you want to test
that your library does what you want it to do.  I would start out with
unit tests to validate the individual pieces work as you designed them to.
 After that you want to do some system level testing.  When I typically
port an algorithm over to Hadoop there are one of two goals that I have.
I either want to reproduce the original algorithm exactly or I want to
create a good enough approximation of it that is extremely scalable.

If you recreated the algorithm exactly you could validate it against the
single computer reference implementation and check that the results are
identical.  With machine learning this is often difficult because many
algorithms use random numbers as part of the process.  To get around this
you sometimes have to modify both implementations to be able to use a
consistent set of pseudo-random numbers.

The other alternative is to use statistics, and this works fairly well no
matter how you ported the algorithm.  Train using the same input data
multiple times using each implementation.  Compare the results against a
test set.  As grad students you probably already understand the stats
necessary to do this correctly already.  Your advisor will probably also
be able to give you better advice on this too, because they can sit down
with you and give you much faster feedback.

--Bobby

On 2/7/13 12:55 AM, "Varsha Raveendran" 
wrote:

>Hello!
>
>
>Based on couple of existing genetic algorithms library available on the
>net, my team and I have come up with a design for the library. But we are
>not able to understand how to validate the library -
>
>Are there any test designs followed to test if a library is working
>correctly?
>
>
>I would like to again mention that we are graduate students and have just
>started working on Hadoop.
>
>Thanks in advance,
>Varsha
>
>
>
>On Sat, Jan 19, 2013 at 9:42 AM, Varsha Raveendran <
>varsha.raveend...@gmail.com> wrote:
>
>> Thank you! I will check with the Mahout team and also go through Commons
>> Math site.
>>
>> Thanks & Regards,
>> Varsha
>>
>>
>> On Sat, Jan 19, 2013 at 12:16 AM, Robert Evans
>>wrote:
>>
>>> I'm not sure I am exactly the right person for this, but I assume that
>>>you
>>> are familiar with genetic algorithms.  The Mahout Project is probably a
>>> good place to start http://mahout.apache.org/ they have a number of
>>> machine learning algorithms that run on top of Hadoop.  I did a search
>>>and
>>> it looks like there may already be some support for them in Mahout,
>>>but I
>>> don't know the current state of it.  It looked like there was some
>>> discussion about it being abandoned and might be deleted.  Either way
>>>it
>>> would be a good starting point.  Commons Math may be a good place to
>>>look
>>> too because there is an implementation there that is already Apache
>>> licensed. So if you borrow some of the code there is no issue
>>> http://commons.apache.org/math/userguide/genetics.html.
>>>
>>> --Bobby Evans
>>>
>>> On 1/16/13 8:24 AM, "Varsha Raveendran" 
>>> wrote:
>>>
>>> >Hello!
>>> >
>>> >I require information regarding a project given on the Hadoop website.
>>> Can
>>> >anyone guide me in the right direction?
>>> >
>>> >The project is "Implement a library/framework to support Genetic
>>> >Algorithmson Hadoop
>>> >Map-Reduce."
>>> >
>>> >
>>> >Regards,
>>> >Varsha
>>> >
>>> >New to Hadoop :)
>>>
>>>
>>
>>
>> --
>> *-Varsha *
>>
>
>
>
>-- 
>*-Varsha *



[jira] [Resolved] (HADOOP-9124) SortedMapWritable violates contract of Map interface for equals() and hashCode()

2013-02-07 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White resolved HADOOP-9124.
---

   Resolution: Fixed
Fix Version/s: 1.2.0

Committed to branch 1.

> SortedMapWritable violates contract of Map interface for equals() and 
> hashCode()
> 
>
> Key: HADOOP-9124
> URL: https://issues.apache.org/jira/browse/HADOOP-9124
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 1.1.1, 2.0.2-alpha
>Reporter: Patrick Hunt
>Assignee: Surenkumar Nihalani
>Priority: Minor
> Fix For: 1.2.0, 2.0.3-alpha, 0.23.7
>
> Attachments: hadoop-9124-branch1.patch, HADOOP-9124.patch, 
> HADOOP-9124.patch, HADOOP-9124.patch, HADOOP-9124.patch, HADOOP-9124.patch, 
> HADOOP-9124.patch, HADOOP-9124.patch, HADOOP-9124.patch, HADOOP-9124.patch, 
> HADOOP-9124.patch
>
>
> This issue is similar to HADOOP-7153. It was found when using MRUnit - see 
> MRUNIT-158, specifically 
> https://issues.apache.org/jira/browse/MRUNIT-158?focusedCommentId=13501985&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13501985
> --
> o.a.h.io.SortedMapWritable implements the java.util.Map interface, however it 
> does not define an implementation of the equals() or hashCode() methods; 
> instead the default implementations in java.lang.Object are used.
> This violates the contract of the Map interface which defines different 
> behaviour for equals() and hashCode() than Object does. More information 
> here: 
> http://download.oracle.com/javase/6/docs/api/java/util/Map.html#equals(java.lang.Object)
> The practical consequence is that SortedMapWritables containing equal entries 
> cannot be compared properly. We were bitten by this when trying to write an 
> MRUnit test for a Mapper that outputs MapWritables; the MRUnit driver cannot 
> test the equality of the expected and actual MapWritable objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira