Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Dhruba Borthakur
I agree with Owen. If we move code out of the contrib project, then it is
more likely to create confusion among users, especially when multiple
versions of the code base float around.

But I agree that we should purge contrib code that is not being used or not
being actively developed.

thanks,
dhruba


On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley  wrote:

>
> On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote:
>
>  Now that http://apache-extras.org is launched (
>> https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches)
>> I'd like to start a discussion on moving contrib components out of common,
>> mapreduce, and hdfs.
>>
>
> The PMC can't "move" code to Apache extras. It can only choose to abandon
> code that it doesn't want to support any longer. As a separate action some
> group of developers may create projects in Apache Extras based on the code
> from Hadoop.
>
> Therefore the question is really what if any code Hadoop wants to abandon.
> That is a good question and one that we should ask ourselves occasionally.
>
> After a quick consideration, my personal list would look like:
>
> failmon
> fault injection
> fuse-dfs
> hod
> kfs
>
> Also note that pushing code out of Hadoop has a high cost. There are at
> least 3 forks of the hadoop-gpl-compression code. That creates a lot of
> confusion for the users. A lot of users never go to the work to figure out
> which fork and branch of hadoop-gpl-compression work with the version of
> Hadoop they installed.
>
> -- Owen
>
>


-- 
Connect to me at http://www.facebook.com/dhruba


Append file is working in on hadoop 0.20

2011-01-31 Thread Alessandro Binhara
Hello ...

I need append file in HDFS ..
I see many forum on internet talking about problems in HDFS to append
files.
That´s is correct ?

Append file is working in on hadoop 0.20 ?

thank ´s


Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Steve Loughran

On 31/01/11 05:24, Konstantin Boudnik wrote:

Shall we not dictate a location of contrib projects once they are
moved of Hadoop? If ppl feel like they are better be served by GitHub
perhaps they should have an option to get hosted there?



-I see discussions about Git at the ASF infra mailing lists

-the stuff in contrib is code contributed to apache, should still live 
there if we can keep it going. Which means people have to step up, or we 
put it in some attic




Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Steve Loughran

On 31/01/11 03:42, Nigel Daley wrote:

Folks,

Now that http://apache-extras.org is launched 
(https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches)
 I'd like to start a discussion on moving contrib components out of common, 
mapreduce, and hdfs.

These contrib components complicate the builds, cause test failures that nobody 
seems to care about, have releases that are tied to Hadoop's long release 
cycles, etc.  Most folks I've talked with agree that these contrib components 
would be better served by being pulled out of Hadoop and hosted elsewhere. The 
new apache-extras code hosting site seems like a natural *default* location for 
migrating these contrib projects.  Perhaps some should graduate from contrib to 
src (ie from contrib to core of the project they're included in).  If folks 
agree, we'll need to come up with a mapping of contrib component to it's final 
destination and file a jira.

Here are the contrib components by project (hopefully I didn't miss any).

Common Contrib:
   failmon
   hod
   test


MapReduce Contrib:
   capacity-scheduler -- move to MR core?
   data_join
   dynamic-scheduler
   eclipse-plugin
   fairscheduler -- move to MR core?
   gridmix
   index
   mrunit
   mumak
   raid
   sqoop
   streaming -- move to MR core?
   vaidya
   vertica



+1 for the schedulers in core
+1 for streaming


For the "accessories",they are really separate projects that work on 
with Hadoop, but could have separate release schedules


 -move them to incubation, try and staff them.
 -if they aren't resourced, then that means they are dead code

I'm -1 to having any support for filesystems other than Posix and HDFS 
in there, =0 on S3, but it's used widely enough it should stay in, 
especially as amazon do apparently provide some funding for testing.


Because, as nigel points out, testing is the enemy. If you don't have 
the implementation of the filesystem in question, there is no way to be 
sure that some change works, you can't use it, release it saying "it 
works", or field bug reports.


Testing and releasing of filesystem interfaces should be the 
responsibility of the filesystem suppliers or whoever wants to develop 
the bridge from the FS to Hadoop.


This raises another issue which I've been thinking of recently, how do 
you define "compatibility". If, for example, my colleagues and I were to 
stand up say "our FS is compatible with Apache Hadoop", what does that 
mean?


-Steve


Defining Compatibility

2011-01-31 Thread Steve Loughran
what does it mean to be compatible with Hadoop? And how do products that 
consider themselves compatible with Hadoop say it?


We have plugin schedulers and the like, and all is well, and the Apache 
brand people keep an eye on distributions of the Hadoop code and make 
sure that Apache Hadoop is cleanly distinguished from redistributions of 
binaries by third parties.


But then you get distributions, and you have to define what is meant in 
terms of functionality and compatibility


Presumably, everyone who issues their own release has either explicitly 
or implicitly done a lot more testing than is in the unit test suite, 
testing that exists to stress test the code on large clusters -is there 
stuff there that needs to be added to SVN to help say a build is of 
sufficiently quality to be released?


Then there are the questions about

-things that work with specific versions/releases of Hadoop?
-replacement filesystems ?
-replacement of core parts of the system, like the MapReduce Engine?

IBM have have been talking about "Hadoop on GPFS"
http://www.almaden.ibm.com/storagesystems/projects/hadoop/

If this is running the MR layer, should it say "Apache Hadoop MR engine 
on top of IBM GPFS", or what -and how do you define or assess 
compatibility at this point? Is it up to the vendor to say "works with 
Apache Hadoop", and is running the Terasort client code sufficient to 
say "compatible"?


Similarly, if the MapReduce engine gets swapped out, what then? We in HP 
Labs have been funding some exploratory work at universities in Berlin 
on an engine that does more operations than just map and reduce, but it 
will also handle the existing operations with API compatibility on the 
worker nodes. The goal here is research with an OSS deliverable, but 
while it may support Hadoop jobs, it's not Hadoop.


What to call such things?


-Steve



Re: Append file is working in on hadoop 0.20

2011-01-31 Thread Ian Holsman
Hi Alessandro.

There is a debate among the experts if the 0.20 version of append is stable, 
and technically the best way of doing it. 

Saying that there are lots of people and companies who are using it in their 
production clusters with no reported hassles.

I suggest you test it out in your own environment to make sure it works for 
you. 
This is something you should be doing anyway. 
 
On Jan 31, 2011, at 6:28 AM, Alessandro Binhara wrote:

> Hello ...
> 
> I need append file in HDFS ..
> I see many forum on internet talking about problems in HDFS to append
> files.
> That´s is correct ?
> 
> Append file is working in on hadoop 0.20 ?
> 
> thank ´s



Re: Defining Compatibility

2011-01-31 Thread Chris Douglas
Steve-

It's hard to answer without more concrete criteria. Is this a
trademark question affecting the marketing of a product? A
cross-compatibility taxonomy for users? The minimum criteria to
publish a paper/release a product without eye-rolling? The particular
compatibility claims made by a system will be nuanced and specific; a
runtime that executes MapReduce jobs as they would run in Hadoop can
simply make that claim, whether it uses parts of MapReduce, HDFS, or
neither.

For the various distributions "Powered by Apache Hadoop," one would
assume that compatibility will vary depending on the featureset and
the audience. A distribution that runs MapReduce applications
as-written for Apache Hadoop may be incompatible with a user's
deployed metrics/monitoring system. Some random script to scrape the
UI may not work. The product may only scale to 20 nodes. Whether these
are "compatible with Apache Hadoop" is awkward to answer generally,
unless we want to define the semantics of that phrase by policy.

To put it bluntly, why would we bother to define such a policy? One
could assert that a fully-compatible system would implement all the
public/stable APIs as defined in HADOOP-5073, but who would that help?
And though interoperability is certainly relevant to systems built on
top of Hadoop, is there a reason the Apache project needs to be
involved in defining the standards for compatibility among them?

Compatibility matters, but I'm not clear on the objective of this discussion. -C

On Mon, Jan 31, 2011 at 5:18 AM, Steve Loughran  wrote:
> what does it mean to be compatible with Hadoop? And how do products that
> consider themselves compatible with Hadoop say it?
>
> We have plugin schedulers and the like, and all is well, and the Apache
> brand people keep an eye on distributions of the Hadoop code and make sure
> that Apache Hadoop is cleanly distinguished from redistributions of binaries
> by third parties.
>
> But then you get distributions, and you have to define what is meant in
> terms of functionality and compatibility
>
> Presumably, everyone who issues their own release has either explicitly or
> implicitly done a lot more testing than is in the unit test suite, testing
> that exists to stress test the code on large clusters -is there stuff there
> that needs to be added to SVN to help say a build is of sufficiently quality
> to be released?
>
> Then there are the questions about
>
> -things that work with specific versions/releases of Hadoop?
> -replacement filesystems ?
> -replacement of core parts of the system, like the MapReduce Engine?
>
> IBM have have been talking about "Hadoop on GPFS"
> http://www.almaden.ibm.com/storagesystems/projects/hadoop/
>
> If this is running the MR layer, should it say "Apache Hadoop MR engine on
> top of IBM GPFS", or what -and how do you define or assess compatibility at
> this point? Is it up to the vendor to say "works with Apache Hadoop", and is
> running the Terasort client code sufficient to say "compatible"?
>
> Similarly, if the MapReduce engine gets swapped out, what then? We in HP
> Labs have been funding some exploratory work at universities in Berlin on an
> engine that does more operations than just map and reduce, but it will also
> handle the existing operations with API compatibility on the worker nodes.
> The goal here is research with an OSS deliverable, but while it may support
> Hadoop jobs, it's not Hadoop.
>
> What to call such things?
>
>
> -Steve
>
>


Re: Defining Compatibility

2011-01-31 Thread Ian Holsman

On Jan 31, 2011, at 8:18 AM, Steve Loughran wrote:

> what does it mean to be compatible with Hadoop? And how do products that 
> consider themselves compatible with Hadoop say it?

I would like to define it in terms of API's and core functionality.

A product (say hive or pig) will run against a set of well defined APIs for a 
given version.
regardless of who implements the API, it should perform as promised, so 
switching between distributions or implementations (say HDFS over GFS) should 
not give the end user any surprises.

Saying that.. HDFS over GFS may expose a superset of APIs that a tool may 
utilize. 
If the tool requires those APIs it is no longer compatible with Apache Hadoop, 
and should not be called such.

For example, early versions of HUE required the Thrift API to be present. So it 
would clearly not be compatible version Apache Hadoop 0.20.

What still perplexes me is what to do when some core functionality (say the 
append patch) that has a identical API to the end-user is promoted as 
compatible... 
I classify this change as a 'end user surprise', *BUT* HDFS over GFS would also 
have similar surprises, where the API is the same, but implemented very 
differently.

So I'm still not sure if you would classify it as being 0.20 compatible.

> 
> We have plugin schedulers and the like, and all is well, and the Apache brand 
> people keep an eye on distributions of the Hadoop code and make sure that 
> Apache Hadoop is cleanly distinguished from redistributions of binaries by 
> third parties.
> 
> But then you get distributions, and you have to define what is meant in terms 
> of functionality and compatibility
> 
> Presumably, everyone who issues their own release has either explicitly or 
> implicitly done a lot more testing than is in the unit test suite, testing 
> that exists to stress test the code on large clusters -is there stuff there 
> that needs to be added to SVN to help say a build is of sufficiently quality 
> to be released?
> 
> Then there are the questions about
> 
> -things that work with specific versions/releases of Hadoop?
> -replacement filesystems ?
> -replacement of core parts of the system, like the MapReduce Engine?
> 
> IBM have have been talking about "Hadoop on GPFS"
> http://www.almaden.ibm.com/storagesystems/projects/hadoop/
> 
> If this is running the MR layer, should it say "Apache Hadoop MR engine on 
> top of IBM GPFS", or what -and how do you define or assess compatibility at 
> this point? Is it up to the vendor to say "works with Apache Hadoop", and is 
> running the Terasort client code sufficient to say "compatible"?
> 
> Similarly, if the MapReduce engine gets swapped out, what then? We in HP Labs 
> have been funding some exploratory work at universities in Berlin on an 
> engine that does more operations than just map and reduce, but it will also 
> handle the existing operations with API compatibility on the worker nodes. 
> The goal here is research with an OSS deliverable, but while it may support 
> Hadoop jobs, it's not Hadoop.
> 
> What to call such things?
> 
> 
> -Steve
> 



Re: Defining Compatibility

2011-01-31 Thread Steve Loughran

On 31/01/11 14:32, Chris Douglas wrote:

Steve-

It's hard to answer without more concrete criteria. Is this a
trademark question affecting the marketing of a product? A
cross-compatibility taxonomy for users? The minimum criteria to
publish a paper/release a product without eye-rolling? The particular
compatibility claims made by a system will be nuanced and specific; a
runtime that executes MapReduce jobs as they would run in Hadoop can
simply make that claim, whether it uses parts of MapReduce, HDFS, or
neither.


No, I'm thinking more about what large scale tests are needed to be run 
against the codebase before you can say "it works", and then how to say 
some changes means that it still works.




For the various distributions "Powered by Apache Hadoop," one would
assume that compatibility will vary depending on the featureset and
the audience. A distribution that runs MapReduce applications
as-written for Apache Hadoop may be incompatible with a user's
deployed metrics/monitoring system. Some random script to scrape the
UI may not work. The product may only scale to 20 nodes. Whether these
are "compatible with Apache Hadoop" is awkward to answer generally,
unless we want to define the semantics of that phrase by policy.

To put it bluntly, why would we bother to define such a policy? One
could assert that a fully-compatible system would implement all the
public/stable APIs as defined in HADOOP-5073, but who would that help?
And though interoperability is certainly relevant to systems built on
top of Hadoop, is there a reason the Apache project needs to be
involved in defining the standards for compatibility among them?


Agreed, I'm just thinking about namings and definitions. Even with the 
stable/unstable internal/external split, there's still the question as 
to what the semantics of operations are, both explicit (this operation 
does X) and implicit (and it takes less than Y seconds to do it). It's 
those implicit things that always catch you out (indeed, they are the 
argument points in things like Java and Java EE compatibility test kits)


Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Konstantin Boudnik
On Sun, Jan 30, 2011 at 23:19, Owen O'Malley  wrote:
>
> On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote:
>
>> Now that http://apache-extras.org is launched
>> (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches)
>> I'd like to start a discussion on moving contrib components out of common,
>> mapreduce, and hdfs.
>
> The PMC can't "move" code to Apache extras. It can only choose to abandon
> code that it doesn't want to support any longer. As a separate action some
> group of developers may create projects in Apache Extras based on the code
> from Hadoop.
>
> Therefore the question is really what if any code Hadoop wants to abandon.
> That is a good question and one that we should ask ourselves occasionally.
>
> After a quick consideration, my personal list would look like:
>
> failmon
> fault injection

This is the best way to kill a project as tightly coupled with the
core code as fault injection.

So, if you really want to kill it - then move it.

> fuse-dfs
> hod
> kfs
>
> Also note that pushing code out of Hadoop has a high cost. There are at
> least 3 forks of the hadoop-gpl-compression code. That creates a lot of
> confusion for the users. A lot of users never go to the work to figure out
> which fork and branch of hadoop-gpl-compression work with the version of
> Hadoop they installed.
>
> -- Owen
>
>


Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Konstantin Boudnik
On Mon, Jan 31, 2011 at 03:43, Steve Loughran  wrote:
> On 31/01/11 05:24, Konstantin Boudnik wrote:
>>
>> Shall we not dictate a location of contrib projects once they are
>> moved of Hadoop? If ppl feel like they are better be served by GitHub
>> perhaps they should have an option to get hosted there?
>
>
> -I see discussions about Git at the ASF infra mailing lists

Then I withdraw my earlier opinion about github vs. *-extras

> -the stuff in contrib is code contributed to apache, should still live there
> if we can keep it going. Which means people have to step up, or we put it in
> some attic
>
>


Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Konstantin Shvachko
Sending this to general to attract urgent attention.
Both HDFS and MapReduce are not compiling since
HADOOP-6904 and its hdfs and MP counterparts were committed.
The problem is not with this patch as described below, but I think those
commits should be reversed if Common integration build cannot be
restored promptly.

Thanks,
--Konstantin


On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
wrote:

> I see Hadoop-common-trunk-Commit is failing and not sending any emails.
> It times out on native compilation and aborts.
> Therefore changes are not integrated, and now it lead to hdfs and mapreduce
> both not compiling.
> Can somebody please take a look at this.
> The last few lines of the build are below.
>
> Thanks
> --Konstantin
>
> [javah] [Loaded 
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
>
> [javah] [Loaded 
> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
> [javah] [Forcefully writing file 
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
>
>  [exec] checking for gcc... gcc
>  [exec] checking whether the C compiler works... yes
>  [exec] checking for C compiler default output file name... a.out
>  [exec] checking for suffix of executables...
>
> Build timed out. Aborting
> Build was aborted
> [FINDBUGS] Skipping publisher since build result is ABORTED
> Publishing Javadoc
> Archiving artifacts
> Recording test results
> No test report files were found. Configuration error?
>
> Recording fingerprints
>  [exec] Terminated
> Publishing Clover coverage report...
> No Clover report will be published due to a Build Failure
> No emails were triggered.
> Finished: ABORTED
>
>
>
>


Re: MRUnit

2011-01-31 Thread Allen Wittenauer


On Jan 30, 2011, at 7:59 PM, Nigel Daley wrote:

> +1.  I just started a thread on moving all components out of contrib.


Pushing them randomly to the web 2.0 version of Sourceforge (where they 
will never be seen from again) doesn't sound like a decent, long term strategy.



Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Eli Collins
Hey Konstantin,

The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
which was fixed.  Trees from trunk are compiling against each other
for me (eg each installed to a local maven repo), perhaps the upstream
maven repo hasn't been updated with the latest bits yet.

Thanks,
Eli

On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
 wrote:
> Sending this to general to attract urgent attention.
> Both HDFS and MapReduce are not compiling since
> HADOOP-6904 and its hdfs and MP counterparts were committed.
> The problem is not with this patch as described below, but I think those
> commits should be reversed if Common integration build cannot be
> restored promptly.
>
> Thanks,
> --Konstantin
>
>
> On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
> wrote:
>
>> I see Hadoop-common-trunk-Commit is failing and not sending any emails.
>> It times out on native compilation and aborts.
>> Therefore changes are not integrated, and now it lead to hdfs and mapreduce
>> both not compiling.
>> Can somebody please take a look at this.
>> The last few lines of the build are below.
>>
>> Thanks
>> --Konstantin
>>
>>     [javah] [Loaded 
>> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
>>
>>     [javah] [Loaded 
>> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
>>     [javah] [Forcefully writing file 
>> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
>>
>>      [exec] checking for gcc... gcc
>>      [exec] checking whether the C compiler works... yes
>>      [exec] checking for C compiler default output file name... a.out
>>      [exec] checking for suffix of executables...
>>
>> Build timed out. Aborting
>> Build was aborted
>> [FINDBUGS] Skipping publisher since build result is ABORTED
>> Publishing Javadoc
>> Archiving artifacts
>> Recording test results
>> No test report files were found. Configuration error?
>>
>> Recording fingerprints
>>      [exec] Terminated
>> Publishing Clover coverage report...
>> No Clover report will be published due to a Build Failure
>> No emails were triggered.
>> Finished: ABORTED
>>
>>
>>
>>
>


Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Ted Dunning
The has been a problem with more than one build failing (Mahout is the one
that I saw first) due to a change in maven version which meant that the
clover license isn't being found properly.  At least, that is the tale I
heard from infra.

On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins  wrote:

> Hey Konstantin,
>
> The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
> which was fixed.  Trees from trunk are compiling against each other
> for me (eg each installed to a local maven repo), perhaps the upstream
> maven repo hasn't been updated with the latest bits yet.
>
> Thanks,
> Eli
>
> On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
>  wrote:
> > Sending this to general to attract urgent attention.
> > Both HDFS and MapReduce are not compiling since
> > HADOOP-6904 and its hdfs and MP counterparts were committed.
> > The problem is not with this patch as described below, but I think those
> > commits should be reversed if Common integration build cannot be
> > restored promptly.
> >
> > Thanks,
> > --Konstantin
> >
> >
> > On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
> > wrote:
> >
> >> I see Hadoop-common-trunk-Commit is failing and not sending any emails.
> >> It times out on native compilation and aborts.
> >> Therefore changes are not integrated, and now it lead to hdfs and
> mapreduce
> >> both not compiling.
> >> Can somebody please take a look at this.
> >> The last few lines of the build are below.
> >>
> >> Thanks
> >> --Konstantin
> >>
> >> [javah] [Loaded
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
> >>
> >> [javah] [Loaded
> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
> >> [javah] [Forcefully writing file
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
> >>
> >>  [exec] checking for gcc... gcc
> >>  [exec] checking whether the C compiler works... yes
> >>  [exec] checking for C compiler default output file name... a.out
> >>  [exec] checking for suffix of executables...
> >>
> >> Build timed out. Aborting
> >> Build was aborted
> >> [FINDBUGS] Skipping publisher since build result is ABORTED
> >> Publishing Javadoc
> >> Archiving artifacts
> >> Recording test results
> >> No test report files were found. Configuration error?
> >>
> >> Recording fingerprints
> >>  [exec] Terminated
> >> Publishing Clover coverage report...
> >> No Clover report will be published due to a Build Failure
> >> No emails were triggered.
> >> Finished: ABORTED
> >>
> >>
> >>
> >>
> >
>


Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Konstantin Shvachko
Current trunk for HDFS and MapReduce are not compiling at the moment. Try to
build trunk.
This is the result of that changes to common api introduced by HADOOP-6904
are not promoted to HDFS and MR trunks.
HDFS-1335 and MAPREDUCE-2263 depend on these changes.

Common is not promoted to HDFS and MR because Hadoop-Common-trunk-Commit
build is broken. See here.
https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-Common-trunk-Commit/

As I see the last successful build was on 01/19, which integrated
HADOOP-6864.
I think this is when JNI changes were introduced, which cannot be digested
by Hudson since then.

Anybody with gcc active could you please verify if the problem is caused by
HADOOP-6864.

Thanks,
--Konstantin

On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning  wrote:

> The has been a problem with more than one build failing (Mahout is the one
> that I saw first) due to a change in maven version which meant that the
> clover license isn't being found properly.  At least, that is the tale I
> heard from infra.
>
> On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins  wrote:
>
> > Hey Konstantin,
> >
> > The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
> > which was fixed.  Trees from trunk are compiling against each other
> > for me (eg each installed to a local maven repo), perhaps the upstream
> > maven repo hasn't been updated with the latest bits yet.
> >
> > Thanks,
> > Eli
> >
> > On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
> >  wrote:
> > > Sending this to general to attract urgent attention.
> > > Both HDFS and MapReduce are not compiling since
> > > HADOOP-6904 and its hdfs and MP counterparts were committed.
> > > The problem is not with this patch as described below, but I think
> those
> > > commits should be reversed if Common integration build cannot be
> > > restored promptly.
> > >
> > > Thanks,
> > > --Konstantin
> > >
> > >
> > > On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
> > > wrote:
> > >
> > >> I see Hadoop-common-trunk-Commit is failing and not sending any
> emails.
> > >> It times out on native compilation and aborts.
> > >> Therefore changes are not integrated, and now it lead to hdfs and
> > mapreduce
> > >> both not compiling.
> > >> Can somebody please take a look at this.
> > >> The last few lines of the build are below.
> > >>
> > >> Thanks
> > >> --Konstantin
> > >>
> > >> [javah] [Loaded
> >
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
> > >>
> > >> [javah] [Loaded
> >
> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
> > >> [javah] [Forcefully writing file
> >
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
> > >>
> > >>  [exec] checking for gcc... gcc
> > >>  [exec] checking whether the C compiler works... yes
> > >>  [exec] checking for C compiler default output file name... a.out
> > >>  [exec] checking for suffix of executables...
> > >>
> > >> Build timed out. Aborting
> > >> Build was aborted
> > >> [FINDBUGS] Skipping publisher since build result is ABORTED
> > >> Publishing Javadoc
> > >> Archiving artifacts
> > >> Recording test results
> > >> No test report files were found. Configuration error?
> > >>
> > >> Recording fingerprints
> > >>  [exec] Terminated
> > >> Publishing Clover coverage report...
> > >> No Clover report will be published due to a Build Failure
> > >> No emails were triggered.
> > >> Finished: ABORTED
> > >>
> > >>
> > >>
> > >>
> > >
> >
>


Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Todd Lipcon
On Mon, Jan 31, 2011 at 1:57 PM, Konstantin Shvachko
wrote:

>
> Anybody with gcc active could you please verify if the problem is caused by
> HADOOP-6864.
>

I can build common trunk just fine on CentOS 5.5 including native.

I think the issue is somehow isolated to the build machines. Anyone know
what OS they've got? Or can I swing an account on the box where the failures
are happening?

-Todd


> On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning 
> wrote:
>
> > The has been a problem with more than one build failing (Mahout is the
> one
> > that I saw first) due to a change in maven version which meant that the
> > clover license isn't being found properly.  At least, that is the tale I
> > heard from infra.
> >
> > On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins  wrote:
> >
> > > Hey Konstantin,
> > >
> > > The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
> > > which was fixed.  Trees from trunk are compiling against each other
> > > for me (eg each installed to a local maven repo), perhaps the upstream
> > > maven repo hasn't been updated with the latest bits yet.
> > >
> > > Thanks,
> > > Eli
> > >
> > > On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
> > >  wrote:
> > > > Sending this to general to attract urgent attention.
> > > > Both HDFS and MapReduce are not compiling since
> > > > HADOOP-6904 and its hdfs and MP counterparts were committed.
> > > > The problem is not with this patch as described below, but I think
> > those
> > > > commits should be reversed if Common integration build cannot be
> > > > restored promptly.
> > > >
> > > > Thanks,
> > > > --Konstantin
> > > >
> > > >
> > > > On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
> > > > wrote:
> > > >
> > > >> I see Hadoop-common-trunk-Commit is failing and not sending any
> > emails.
> > > >> It times out on native compilation and aborts.
> > > >> Therefore changes are not integrated, and now it lead to hdfs and
> > > mapreduce
> > > >> both not compiling.
> > > >> Can somebody please take a look at this.
> > > >> The last few lines of the build are below.
> > > >>
> > > >> Thanks
> > > >> --Konstantin
> > > >>
> > > >> [javah] [Loaded
> > >
> >
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
> > > >>
> > > >> [javah] [Loaded
> > >
> >
> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
> > > >> [javah] [Forcefully writing file
> > >
> >
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
> > > >>
> > > >>  [exec] checking for gcc... gcc
> > > >>  [exec] checking whether the C compiler works... yes
> > > >>  [exec] checking for C compiler default output file name...
> a.out
> > > >>  [exec] checking for suffix of executables...
> > > >>
> > > >> Build timed out. Aborting
> > > >> Build was aborted
> > > >> [FINDBUGS] Skipping publisher since build result is ABORTED
> > > >> Publishing Javadoc
> > > >> Archiving artifacts
> > > >> Recording test results
> > > >> No test report files were found. Configuration error?
> > > >>
> > > >> Recording fingerprints
> > > >>  [exec] Terminated
> > > >> Publishing Clover coverage report...
> > > >> No Clover report will be published due to a Build Failure
> > > >> No emails were triggered.
> > > >> Finished: ABORTED
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Jakob Homan
By manually installing a new core jar into the cache, I can compile
trunk.  Looks like we just need to kick a new Core into maven.  Are
there instructions somewhere for committers to do this?  I know Nigel
and Owen know how, but I don't know if the knowledge is diffused past
them.
-Jakob


On Mon, Jan 31, 2011 at 1:57 PM, Konstantin Shvachko
 wrote:
> Current trunk for HDFS and MapReduce are not compiling at the moment. Try to
> build trunk.
> This is the result of that changes to common api introduced by HADOOP-6904
> are not promoted to HDFS and MR trunks.
> HDFS-1335 and MAPREDUCE-2263 depend on these changes.
>
> Common is not promoted to HDFS and MR because Hadoop-Common-trunk-Commit
> build is broken. See here.
> https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-Common-trunk-Commit/
>
> As I see the last successful build was on 01/19, which integrated
> HADOOP-6864.
> I think this is when JNI changes were introduced, which cannot be digested
> by Hudson since then.
>
> Anybody with gcc active could you please verify if the problem is caused by
> HADOOP-6864.
>
> Thanks,
> --Konstantin
>
> On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning  wrote:
>
>> The has been a problem with more than one build failing (Mahout is the one
>> that I saw first) due to a change in maven version which meant that the
>> clover license isn't being found properly.  At least, that is the tale I
>> heard from infra.
>>
>> On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins  wrote:
>>
>> > Hey Konstantin,
>> >
>> > The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
>> > which was fixed.  Trees from trunk are compiling against each other
>> > for me (eg each installed to a local maven repo), perhaps the upstream
>> > maven repo hasn't been updated with the latest bits yet.
>> >
>> > Thanks,
>> > Eli
>> >
>> > On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
>> >  wrote:
>> > > Sending this to general to attract urgent attention.
>> > > Both HDFS and MapReduce are not compiling since
>> > > HADOOP-6904 and its hdfs and MP counterparts were committed.
>> > > The problem is not with this patch as described below, but I think
>> those
>> > > commits should be reversed if Common integration build cannot be
>> > > restored promptly.
>> > >
>> > > Thanks,
>> > > --Konstantin
>> > >
>> > >
>> > > On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
>> > > wrote:
>> > >
>> > >> I see Hadoop-common-trunk-Commit is failing and not sending any
>> emails.
>> > >> It times out on native compilation and aborts.
>> > >> Therefore changes are not integrated, and now it lead to hdfs and
>> > mapreduce
>> > >> both not compiling.
>> > >> Can somebody please take a look at this.
>> > >> The last few lines of the build are below.
>> > >>
>> > >> Thanks
>> > >> --Konstantin
>> > >>
>> > >>     [javah] [Loaded
>> >
>> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
>> > >>
>> > >>     [javah] [Loaded
>> >
>> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
>> > >>     [javah] [Forcefully writing file
>> >
>> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
>> > >>
>> > >>      [exec] checking for gcc... gcc
>> > >>      [exec] checking whether the C compiler works... yes
>> > >>      [exec] checking for C compiler default output file name... a.out
>> > >>      [exec] checking for suffix of executables...
>> > >>
>> > >> Build timed out. Aborting
>> > >> Build was aborted
>> > >> [FINDBUGS] Skipping publisher since build result is ABORTED
>> > >> Publishing Javadoc
>> > >> Archiving artifacts
>> > >> Recording test results
>> > >> No test report files were found. Configuration error?
>> > >>
>> > >> Recording fingerprints
>> > >>      [exec] Terminated
>> > >> Publishing Clover coverage report...
>> > >> No Clover report will be published due to a Build Failure
>> > >> No emails were triggered.
>> > >> Finished: ABORTED
>> > >>
>> > >>
>> > >>
>> > >>
>> > >
>> >
>>
>


Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Milind Bhandarkar
Owen,

I am surprised to not see jute (aka hadoop recordio) on this list.

- milind

On Jan 30, 2011, at 11:19 PM, Owen O'Malley wrote:

> 
> On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote:
> 
>> Now that http://apache-extras.org is launched 
>> (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches)
>>  I'd like to start a discussion on moving contrib components out of common, 
>> mapreduce, and hdfs.
> 
> The PMC can't "move" code to Apache extras. It can only choose to abandon 
> code that it doesn't want to support any longer. As a separate action some 
> group of developers may create projects in Apache Extras based on the code 
> from Hadoop.
> 
> Therefore the question is really what if any code Hadoop wants to abandon. 
> That is a good question and one that we should ask ourselves occasionally.
> 
> After a quick consideration, my personal list would look like:
> 
> failmon
> fault injection
> fuse-dfs
> hod
> kfs
> 
> Also note that pushing code out of Hadoop has a high cost. There are at least 
> 3 forks of the hadoop-gpl-compression code. That creates a lot of confusion 
> for the users. A lot of users never go to the work to figure out which fork 
> and branch of hadoop-gpl-compression work with the version of Hadoop they 
> installed.
> 
> -- Owen
> 

---
Milind Bhandarkar
mbhandar...@linkedin.com





Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Todd Lipcon
On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley  wrote:

>
> Also note that pushing code out of Hadoop has a high cost. There are at
> least 3 forks of the hadoop-gpl-compression code. That creates a lot of
> confusion for the users. A lot of users never go to the work to figure out
> which fork and branch of hadoop-gpl-compression work with the version of
> Hadoop they installed.
>
>
Indeed it creates confusion, but in my opinion it has been very successful
modulo that confusion.

In particular, Kevin and I (who each have a repo on github but basically
co-maintain a branch) have done about 8 bugfix releases of LZO in the last
year. The ability to take a bug and turn it around into a release within a
few days has been very beneficial to the users. If it were part of core
Hadoop, people would be forced to live with these blocker bugs for months at
a time between dot releases.

IMO the more we can take non-core components and move them to separate
release timelines, the better. Yes, it is harder for users, but it also is
easier for them when they hit a bug - they don't have to wait months for a
wholesale upgrade which might contain hundreds of other changes to core
components. I think this will also help the situation where people have set
up shop on branches -- a lot of the value of these branches comes from the
frequency of backports and bugfixes to "non-core" components. If the
non-core stuff were on a faster timeline upstream, we could maintain core
stability while also offering people the latest and greatest libraries,
tools, codecs, etc.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Giridharan Kesavan
ant mvn-deploy would publish snapshot artifact to the apache maven repository 
as long you have the right credentials in ~/.m2/settings.xml.

For settings.xml template pls look at http://wiki.apache.org/hadoop/HowToRelease

I'm pushing the latest common artifacts now.

-Giri



On Jan 31, 2011, at 3:11 PM, Jakob Homan wrote:

> By manually installing a new core jar into the cache, I can compile
> trunk.  Looks like we just need to kick a new Core into maven.  Are
> there instructions somewhere for committers to do this?  I know Nigel
> and Owen know how, but I don't know if the knowledge is diffused past
> them.
> -Jakob
> 
> 
> On Mon, Jan 31, 2011 at 1:57 PM, Konstantin Shvachko
>  wrote:
>> Current trunk for HDFS and MapReduce are not compiling at the moment. Try to
>> build trunk.
>> This is the result of that changes to common api introduced by HADOOP-6904
>> are not promoted to HDFS and MR trunks.
>> HDFS-1335 and MAPREDUCE-2263 depend on these changes.
>> 
>> Common is not promoted to HDFS and MR because Hadoop-Common-trunk-Commit
>> build is broken. See here.
>> https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-Common-trunk-Commit/
>> 
>> As I see the last successful build was on 01/19, which integrated
>> HADOOP-6864.
>> I think this is when JNI changes were introduced, which cannot be digested
>> by Hudson since then.
>> 
>> Anybody with gcc active could you please verify if the problem is caused by
>> HADOOP-6864.
>> 
>> Thanks,
>> --Konstantin
>> 
>> On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning  wrote:
>> 
>>> The has been a problem with more than one build failing (Mahout is the one
>>> that I saw first) due to a change in maven version which meant that the
>>> clover license isn't being found properly.  At least, that is the tale I
>>> heard from infra.
>>> 
>>> On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins  wrote:
>>> 
 Hey Konstantin,
 
 The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
 which was fixed.  Trees from trunk are compiling against each other
 for me (eg each installed to a local maven repo), perhaps the upstream
 maven repo hasn't been updated with the latest bits yet.
 
 Thanks,
 Eli
 
 On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
  wrote:
> Sending this to general to attract urgent attention.
> Both HDFS and MapReduce are not compiling since
> HADOOP-6904 and its hdfs and MP counterparts were committed.
> The problem is not with this patch as described below, but I think
>>> those
> commits should be reversed if Common integration build cannot be
> restored promptly.
> 
> Thanks,
> --Konstantin
> 
> 
> On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
> wrote:
> 
>> I see Hadoop-common-trunk-Commit is failing and not sending any
>>> emails.
>> It times out on native compilation and aborts.
>> Therefore changes are not integrated, and now it lead to hdfs and
 mapreduce
>> both not compiling.
>> Can somebody please take a look at this.
>> The last few lines of the build are below.
>> 
>> Thanks
>> --Konstantin
>> 
>> [javah] [Loaded
 
>>> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
>> 
>> [javah] [Loaded
 
>>> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
>> [javah] [Forcefully writing file
 
>>> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
>> 
>>  [exec] checking for gcc... gcc
>>  [exec] checking whether the C compiler works... yes
>>  [exec] checking for C compiler default output file name... a.out
>>  [exec] checking for suffix of executables...
>> 
>> Build timed out. Aborting
>> Build was aborted
>> [FINDBUGS] Skipping publisher since build result is ABORTED
>> Publishing Javadoc
>> Archiving artifacts
>> Recording test results
>> No test report files were found. Configuration error?
>> 
>> Recording fingerprints
>>  [exec] Terminated
>> Publishing Clover coverage report...
>> No Clover report will be published due to a Build Failure
>> No emails were triggered.
>> Finished: ABORTED
>> 
>> 
>> 
>> 
> 
 
>>> 
>> 



Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Konstantin Shvachko
Giri
looks like the last run you started failed the same way as previous ones.
Any thoughts on what's going on?
Thanks,
--Konstantin

On Mon, Jan 31, 2011 at 3:33 PM, Giridharan Kesavan
wrote:

> ant mvn-deploy would publish snapshot artifact to the apache maven
> repository as long you have the right credentials in ~/.m2/settings.xml.
>
> For settings.xml template pls look at
> http://wiki.apache.org/hadoop/HowToRelease
>
> I'm pushing the latest common artifacts now.
>
> -Giri
>
>
>
> On Jan 31, 2011, at 3:11 PM, Jakob Homan wrote:
>
> > By manually installing a new core jar into the cache, I can compile
> > trunk.  Looks like we just need to kick a new Core into maven.  Are
> > there instructions somewhere for committers to do this?  I know Nigel
> > and Owen know how, but I don't know if the knowledge is diffused past
> > them.
> > -Jakob
> >
> >
> > On Mon, Jan 31, 2011 at 1:57 PM, Konstantin Shvachko
> >  wrote:
> >> Current trunk for HDFS and MapReduce are not compiling at the moment.
> Try to
> >> build trunk.
> >> This is the result of that changes to common api introduced by
> HADOOP-6904
> >> are not promoted to HDFS and MR trunks.
> >> HDFS-1335 and MAPREDUCE-2263 depend on these changes.
> >>
> >> Common is not promoted to HDFS and MR because Hadoop-Common-trunk-Commit
> >> build is broken. See here.
> >>
> https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-Common-trunk-Commit/
> >>
> >> As I see the last successful build was on 01/19, which integrated
> >> HADOOP-6864.
> >> I think this is when JNI changes were introduced, which cannot be
> digested
> >> by Hudson since then.
> >>
> >> Anybody with gcc active could you please verify if the problem is caused
> by
> >> HADOOP-6864.
> >>
> >> Thanks,
> >> --Konstantin
> >>
> >> On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning 
> wrote:
> >>
> >>> The has been a problem with more than one build failing (Mahout is the
> one
> >>> that I saw first) due to a change in maven version which meant that the
> >>> clover license isn't being found properly.  At least, that is the tale
> I
> >>> heard from infra.
> >>>
> >>> On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins  wrote:
> >>>
>  Hey Konstantin,
> 
>  The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
>  which was fixed.  Trees from trunk are compiling against each other
>  for me (eg each installed to a local maven repo), perhaps the upstream
>  maven repo hasn't been updated with the latest bits yet.
> 
>  Thanks,
>  Eli
> 
>  On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
>   wrote:
> > Sending this to general to attract urgent attention.
> > Both HDFS and MapReduce are not compiling since
> > HADOOP-6904 and its hdfs and MP counterparts were committed.
> > The problem is not with this patch as described below, but I think
> >>> those
> > commits should be reversed if Common integration build cannot be
> > restored promptly.
> >
> > Thanks,
> > --Konstantin
> >
> >
> > On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
> > wrote:
> >
> >> I see Hadoop-common-trunk-Commit is failing and not sending any
> >>> emails.
> >> It times out on native compilation and aborts.
> >> Therefore changes are not integrated, and now it lead to hdfs and
>  mapreduce
> >> both not compiling.
> >> Can somebody please take a look at this.
> >> The last few lines of the build are below.
> >>
> >> Thanks
> >> --Konstantin
> >>
> >> [javah] [Loaded
> 
> >>>
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
> >>
> >> [javah] [Loaded
> 
> >>>
> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
> >> [javah] [Forcefully writing file
> 
> >>>
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
> >>
> >>  [exec] checking for gcc... gcc
> >>  [exec] checking whether the C compiler works... yes
> >>  [exec] checking for C compiler default output file name...
> a.out
> >>  [exec] checking for suffix of executables...
> >>
> >> Build timed out. Aborting
> >> Build was aborted
> >> [FINDBUGS] Skipping publisher since build result is ABORTED
> >> Publishing Javadoc
> >> Archiving artifacts
> >> Recording test results
> >> No test report files were found. Configuration error?
> >>
> >> Recording fingerprints
> >>  [exec] Terminated
> >> Publishing Clover coverage report...
> >> No Clover report will be published due to a Build Failure
> >> No emails were triggered.
> >> Finished: ABORTED
> >>
> >>
> >>
> >>
> >
> 
> >>>
> >>
>
>


Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Giridharan Kesavan
Konstantin,

I think I need to restart the slave which is running the commit build. For now 
I have published the common artifact manually from commandline.

Thanks,
Giri

On Jan 31, 2011, at 4:27 PM, Konstantin Shvachko wrote:

> Giri
> looks like the last run you started failed the same way as previous ones.
> Any thoughts on what's going on?
> Thanks,
> --Konstantin
> 
> On Mon, Jan 31, 2011 at 3:33 PM, Giridharan Kesavan
> wrote:
> 
>> ant mvn-deploy would publish snapshot artifact to the apache maven
>> repository as long you have the right credentials in ~/.m2/settings.xml.
>> 
>> For settings.xml template pls look at
>> http://wiki.apache.org/hadoop/HowToRelease
>> 
>> I'm pushing the latest common artifacts now.
>> 
>> -Giri
>> 
>> 
>> 
>> On Jan 31, 2011, at 3:11 PM, Jakob Homan wrote:
>> 
>>> By manually installing a new core jar into the cache, I can compile
>>> trunk.  Looks like we just need to kick a new Core into maven.  Are
>>> there instructions somewhere for committers to do this?  I know Nigel
>>> and Owen know how, but I don't know if the knowledge is diffused past
>>> them.
>>> -Jakob
>>> 
>>> 
>>> On Mon, Jan 31, 2011 at 1:57 PM, Konstantin Shvachko
>>>  wrote:
 Current trunk for HDFS and MapReduce are not compiling at the moment.
>> Try to
 build trunk.
 This is the result of that changes to common api introduced by
>> HADOOP-6904
 are not promoted to HDFS and MR trunks.
 HDFS-1335 and MAPREDUCE-2263 depend on these changes.
 
 Common is not promoted to HDFS and MR because Hadoop-Common-trunk-Commit
 build is broken. See here.
 
>> https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-Common-trunk-Commit/
 
 As I see the last successful build was on 01/19, which integrated
 HADOOP-6864.
 I think this is when JNI changes were introduced, which cannot be
>> digested
 by Hudson since then.
 
 Anybody with gcc active could you please verify if the problem is caused
>> by
 HADOOP-6864.
 
 Thanks,
 --Konstantin
 
 On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning 
>> wrote:
 
> The has been a problem with more than one build failing (Mahout is the
>> one
> that I saw first) due to a change in maven version which meant that the
> clover license isn't being found properly.  At least, that is the tale
>> I
> heard from infra.
> 
> On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins  wrote:
> 
>> Hey Konstantin,
>> 
>> The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
>> which was fixed.  Trees from trunk are compiling against each other
>> for me (eg each installed to a local maven repo), perhaps the upstream
>> maven repo hasn't been updated with the latest bits yet.
>> 
>> Thanks,
>> Eli
>> 
>> On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
>>  wrote:
>>> Sending this to general to attract urgent attention.
>>> Both HDFS and MapReduce are not compiling since
>>> HADOOP-6904 and its hdfs and MP counterparts were committed.
>>> The problem is not with this patch as described below, but I think
> those
>>> commits should be reversed if Common integration build cannot be
>>> restored promptly.
>>> 
>>> Thanks,
>>> --Konstantin
>>> 
>>> 
>>> On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
>>> wrote:
>>> 
 I see Hadoop-common-trunk-Commit is failing and not sending any
> emails.
 It times out on native compilation and aborts.
 Therefore changes are not integrated, and now it lead to hdfs and
>> mapreduce
 both not compiling.
 Can somebody please take a look at this.
 The last few lines of the build are below.
 
 Thanks
 --Konstantin
 
[javah] [Loaded
>> 
> 
>> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
 
[javah] [Loaded
>> 
> 
>> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
[javah] [Forcefully writing file
>> 
> 
>> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
 
 [exec] checking for gcc... gcc
 [exec] checking whether the C compiler works... yes
 [exec] checking for C compiler default output file name...
>> a.out
 [exec] checking for suffix of executables...
 
 Build timed out. Aborting
 Build was aborted
 [FINDBUGS] Skipping publisher since build result is ABORTED
 Publishing Javadoc
 Archiving artifacts
 Recording test results
 No test report files were found. Configuration error?
 
>>>

Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Konstantin Shvachko
Thanks, Giri.
--Konst

On Mon, Jan 31, 2011 at 4:40 PM, Giridharan Kesavan
wrote:

> Konstantin,
>
> I think I need to restart the slave which is running the commit build. For
> now I have published the common artifact manually from commandline.
>
> Thanks,
> Giri
>
> On Jan 31, 2011, at 4:27 PM, Konstantin Shvachko wrote:
>
> > Giri
> > looks like the last run you started failed the same way as previous ones.
> > Any thoughts on what's going on?
> > Thanks,
> > --Konstantin
> >
> > On Mon, Jan 31, 2011 at 3:33 PM, Giridharan Kesavan
> > wrote:
> >
> >> ant mvn-deploy would publish snapshot artifact to the apache maven
> >> repository as long you have the right credentials in ~/.m2/settings.xml.
> >>
> >> For settings.xml template pls look at
> >> http://wiki.apache.org/hadoop/HowToRelease
> >>
> >> I'm pushing the latest common artifacts now.
> >>
> >> -Giri
> >>
> >>
> >>
> >> On Jan 31, 2011, at 3:11 PM, Jakob Homan wrote:
> >>
> >>> By manually installing a new core jar into the cache, I can compile
> >>> trunk.  Looks like we just need to kick a new Core into maven.  Are
> >>> there instructions somewhere for committers to do this?  I know Nigel
> >>> and Owen know how, but I don't know if the knowledge is diffused past
> >>> them.
> >>> -Jakob
> >>>
> >>>
> >>> On Mon, Jan 31, 2011 at 1:57 PM, Konstantin Shvachko
> >>>  wrote:
>  Current trunk for HDFS and MapReduce are not compiling at the moment.
> >> Try to
>  build trunk.
>  This is the result of that changes to common api introduced by
> >> HADOOP-6904
>  are not promoted to HDFS and MR trunks.
>  HDFS-1335 and MAPREDUCE-2263 depend on these changes.
> 
>  Common is not promoted to HDFS and MR because
> Hadoop-Common-trunk-Commit
>  build is broken. See here.
> 
> >>
> https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-Common-trunk-Commit/
> 
>  As I see the last successful build was on 01/19, which integrated
>  HADOOP-6864.
>  I think this is when JNI changes were introduced, which cannot be
> >> digested
>  by Hudson since then.
> 
>  Anybody with gcc active could you please verify if the problem is
> caused
> >> by
>  HADOOP-6864.
> 
>  Thanks,
>  --Konstantin
> 
>  On Mon, Jan 31, 2011 at 1:36 PM, Ted Dunning 
> >> wrote:
> 
> > The has been a problem with more than one build failing (Mahout is
> the
> >> one
> > that I saw first) due to a change in maven version which meant that
> the
> > clover license isn't being found properly.  At least, that is the
> tale
> >> I
> > heard from infra.
> >
> > On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins 
> wrote:
> >
> >> Hey Konstantin,
> >>
> >> The only build breakage I saw from HADOOP-6904 is MAPREDUCE-2290,
> >> which was fixed.  Trees from trunk are compiling against each other
> >> for me (eg each installed to a local maven repo), perhaps the
> upstream
> >> maven repo hasn't been updated with the latest bits yet.
> >>
> >> Thanks,
> >> Eli
> >>
> >> On Mon, Jan 31, 2011 at 12:14 PM, Konstantin Shvachko
> >>  wrote:
> >>> Sending this to general to attract urgent attention.
> >>> Both HDFS and MapReduce are not compiling since
> >>> HADOOP-6904 and its hdfs and MP counterparts were committed.
> >>> The problem is not with this patch as described below, but I think
> > those
> >>> commits should be reversed if Common integration build cannot be
> >>> restored promptly.
> >>>
> >>> Thanks,
> >>> --Konstantin
> >>>
> >>>
> >>> On Fri, Jan 28, 2011 at 5:53 PM, Konstantin Shvachko
> >>> wrote:
> >>>
>  I see Hadoop-common-trunk-Commit is failing and not sending any
> > emails.
>  It times out on native compilation and aborts.
>  Therefore changes are not integrated, and now it lead to hdfs and
> >> mapreduce
>  both not compiling.
>  Can somebody please take a look at this.
>  The last few lines of the build are below.
> 
>  Thanks
>  --Konstantin
> 
> [javah] [Loaded
> >>
> >
> >>
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/classes/org/apache/hadoop/security/JniBasedUnixGroupsMapping.class]
> 
> [javah] [Loaded
> >>
> >
> >>
> /homes/hudson/tools/java/jdk1.6.0_11-32/jre/lib/rt.jar(java/lang/Object.class)]
> [javah] [Forcefully writing file
> >>
> >
> >>
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Common-trunk-Commit/trunk/build/native/Linux-i386-32/src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h]
> 
>  [exec] checking for gcc... gcc
>  [exec] checking whether the C compiler works... yes
>  [exec] checking for C compiler default output file name...
> >> a.out
>  [exec] che

[ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-01-31 Thread Eric Baldeschwieler
Hi Folks,

I'm pleased to announce that after some reflection, Yahoo! has decided to 
discontinue the  "The Yahoo Distribution of Hadoop" and focus on Apache Hadoop. 
 We plan to remove all references to a Yahoo distribution from our website 
(developer.yahoo.com/hadoop), close our github repo 
(yahoo.github.com/hadoop-common) and focus on working more closely with the 
Apache community.  Our intent is to return to helping Apache produce binary 
releases of Apache Hadoop that are so bullet proof that Yahoo and other 
production Hadoop users can run them unpatched on their clusters.

Until Hadoop 0.20, Yahoo committers worked as release masters to produce binary 
Apache Hadoop releases that the entire community used on their clusters.As 
the community grew, we have experiment with using the "Yahoo! Distribution of 
Hadoop" as the vehicle to share our work.  Unfortunately, Apache is no longer 
the obvious place to go for Hadoop releases.  The Yahoo! team wants to return 
to a world where anyone can download and directly use releases of Hadoop from 
Apache.  We want to contribute to the stabilization and testing of those 
releases.  We also want to share our regular program of sustaining engineering 
that backports minor feature enhancements into new dot releases on a regular 
basis, so that the world sees regular improvements coming from Apache every few 
months, not years.

Recently the Apache Hadoop community has been very turbulent.  Over the last 
few months we have been developing Hadoop enhancements in our internal git 
repository while doing a complete review of our options. Our commitment to open 
sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd), but the 
future of the "Yahoo distribution of Hadoop" was far from clear.  We've 
concluded that focusing on Apache Hadoop is the way forward.  We believe that 
more focus on communicating our goals to the Apache Hadoop community, and more 
willingness to compromise on how we get to those goals, will help us get back 
to making Hadoop even better.

Unfortunately, we now have to sort out how to contribute several person-years 
worth of work to Apache to let us unwind the Yahoo! git repositories.  We 
currently run two lines of Hadoop development, our sustaining program 
(hadoop-0.20-sustaining) and hadoop-future.  Hadoop-0.20-sustaining is the 
stable version of Hadoop we currently run on Yahoo's 40,000 nodes.  It contains 
a series of fixes and enhancements that are all backwards compatible with our 
"Hadoop 0.20 with security".  It is our most stable and high performance 
release of Hadoop ever.  We've expended a lot of energy finding and fixing bugs 
in it this year. We have initiated the process of contributing this work to 
Apache in the branch: hadoop/common/branches/branch-0.20-security.  We've 
proposed calling this the 20.100 release.  Once folks have had a chance to try 
this out and we've had a chance to respond to their feedback, we plan to create 
20.100 release candidates and ask the community to vote on making them Apache 
releases. 

Hadoop-future is our new feature branch.  We are working on a set of new 
features for Hadoop to improve its availability, scalability and 
interoperability to make Hadoop more usable in mission critical deployments. 
You're going to see another burst of email activity from us as we work to get 
hadoop-future patches socialized, reviewed and checked in.  These bulk checkins 
are exceptional.  They are the result of us striving to be more transparent.  
Once we've merged our hadoop-future and hadoop-0.20-sustaining work back into 
Apache, folks can expect us to return to our regular development cadence.  
Looking forward, we plan to socialize our roadmaps regularly, actively 
synchronize our work with other active Hadoop contributors and develop our code 
collaboratively, directly in Apache.

In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" is 
a commitment to working more effectively with the Apache Hadoop community.  Our 
goal is to make Apache Hadoop THE open source platform for big data.

Thanks,

E14

--

PS Here is a draft list of key features in hadoop-future:

* HDFS-1052 - Federation, the ability to support much more storage per Hadoop 
cluster.

* HADOOP-6728 - A the new metrics framework

* MAPREDUCE-1220 - Optimizations for small jobs

---
PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W

Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

2011-01-31 Thread Jeff Hammerbacher
Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as
well?

On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler
wrote:

> Hi Folks,
>
> I'm pleased to announce that after some reflection, Yahoo! has decided to
> discontinue the  "The Yahoo Distribution of Hadoop" and focus on Apache
> Hadoop.  We plan to remove all references to a Yahoo distribution from our
> website (developer.yahoo.com/hadoop), close our github repo (
> yahoo.github.com/hadoop-common) and focus on working more closely with the
> Apache community.  Our intent is to return to helping Apache produce binary
> releases of Apache Hadoop that are so bullet proof that Yahoo and other
> production Hadoop users can run them unpatched on their clusters.
>
> Until Hadoop 0.20, Yahoo committers worked as release masters to produce
> binary Apache Hadoop releases that the entire community used on their
> clusters.As the community grew, we have experiment with using the
> "Yahoo! Distribution of Hadoop" as the vehicle to share our work.
>  Unfortunately, Apache is no longer the obvious place to go for Hadoop
> releases.  The Yahoo! team wants to return to a world where anyone can
> download and directly use releases of Hadoop from Apache.  We want to
> contribute to the stabilization and testing of those releases.  We also want
> to share our regular program of sustaining engineering that backports minor
> feature enhancements into new dot releases on a regular basis, so that the
> world sees regular improvements coming from Apache every few months, not
> years.
>
> Recently the Apache Hadoop community has been very turbulent.  Over the
> last few months we have been developing Hadoop enhancements in our internal
> git repository while doing a complete review of our options. Our commitment
> to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd),
> but the future of the "Yahoo distribution of Hadoop" was far from clear.
>  We've concluded that focusing on Apache Hadoop is the way forward.  We
> believe that more focus on communicating our goals to the Apache Hadoop
> community, and more willingness to compromise on how we get to those goals,
> will help us get back to making Hadoop even better.
>
> Unfortunately, we now have to sort out how to contribute several
> person-years worth of work to Apache to let us unwind the Yahoo! git
> repositories.  We currently run two lines of Hadoop development, our
> sustaining program (hadoop-0.20-sustaining) and hadoop-future.
>  Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on
> Yahoo's 40,000 nodes.  It contains a series of fixes and enhancements that
> are all backwards compatible with our "Hadoop 0.20 with security".  It is
> our most stable and high performance release of Hadoop ever.  We've expended
> a lot of energy finding and fixing bugs in it this year. We have initiated
> the process of contributing this work to Apache in the branch:
> hadoop/common/branches/branch-0.20-security.  We've proposed calling this
> the 20.100 release.  Once folks have had a chance to try this out and we've
> had a chance to respond to their feedback, we plan to create 20.100 release
> candidates and ask the community to vote on making them Apache releases.
>
> Hadoop-future is our new feature branch.  We are working on a set of new
> features for Hadoop to improve its availability, scalability and
> interoperability to make Hadoop more usable in mission critical deployments.
> You're going to see another burst of email activity from us as we work to
> get hadoop-future patches socialized, reviewed and checked in.  These bulk
> checkins are exceptional.  They are the result of us striving to be more
> transparent.  Once we've merged our hadoop-future and hadoop-0.20-sustaining
> work back into Apache, folks can expect us to return to our regular
> development cadence.  Looking forward, we plan to socialize our roadmaps
> regularly, actively synchronize our work with other active Hadoop
> contributors and develop our code collaboratively, directly in Apache.
>
> In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop"
> is a commitment to working more effectively with the Apache Hadoop
> community.  Our goal is to make Apache Hadoop THE open source platform for
> big data.
>
> Thanks,
>
> E14
>
> --
>
> PS Here is a draft list of key features in hadoop-future:
>
> * HDFS-1052 - Federation, the ability to support much more storage per
> Hadoop cluster.
>
> * HADOOP-6728 - A the new metrics framework
>
> * MAPREDUCE-1220 - Optimizations for small jobs
>
> ---
> PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W


Re: MRUnit

2011-01-31 Thread Aaron Kimball
+1 - thanks for taking the initiative to clean this up!

You should file a JIRA with a patch that removes src/contrib/mrunit and
removes it from src/contrib/build.xml.

- Aaron




On Sun, Jan 30, 2011 at 7:59 PM, Nigel Daley  wrote:

> +1.  I just started a thread on moving all components out of contrib.
>
> nige
>
> On Jan 30, 2011, at 6:50 PM, Jeff Hammerbacher wrote:
>
> >>
> >> I think it makes sense to remove mrunit from contrib (after a
> >> deprecation period) but I'm curious to hear others' opinions.
> >
> >
> > Oh please do take things out of contrib. I'd love to see all of it move
> to
> > Github. Thanks for putting in the time for MRUnit.
>
>


Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Aaron Kimball
+1 to this process in general.

In particular, tools like MRUnit can benefit from having an independent
release due to where they are used in a project's lifecycle. MRUnit should
be specified as a test dependency, whereas Hadoop itself is a
compile/runtime dependency.  As it stands, there isn't an easy way to manage
this. This would increase flexibility for this tool, probably for others as
well.

- Aaron

On Mon, Jan 31, 2011 at 3:23 PM, Todd Lipcon  wrote:

> On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley 
> wrote:
>
> >
> > Also note that pushing code out of Hadoop has a high cost. There are at
> > least 3 forks of the hadoop-gpl-compression code. That creates a lot of
> > confusion for the users. A lot of users never go to the work to figure
> out
> > which fork and branch of hadoop-gpl-compression work with the version of
> > Hadoop they installed.
> >
> >
> Indeed it creates confusion, but in my opinion it has been very successful
> modulo that confusion.
>
> In particular, Kevin and I (who each have a repo on github but basically
> co-maintain a branch) have done about 8 bugfix releases of LZO in the last
> year. The ability to take a bug and turn it around into a release within a
> few days has been very beneficial to the users. If it were part of core
> Hadoop, people would be forced to live with these blocker bugs for months
> at
> a time between dot releases.
>
> IMO the more we can take non-core components and move them to separate
> release timelines, the better. Yes, it is harder for users, but it also is
> easier for them when they hit a bug - they don't have to wait months for a
> wholesale upgrade which might contain hundreds of other changes to core
> components. I think this will also help the situation where people have set
> up shop on branches -- a lot of the value of these branches comes from the
> frequency of backports and bugfixes to "non-core" components. If the
> non-core stuff were on a faster timeline upstream, we could maintain core
> stability while also offering people the latest and greatest libraries,
> tools, codecs, etc.
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>