[jira] [Created] (HDFS-4467) Segmentation fault in libhdfs while connecting to HDFS and running a Hive Query

2013-02-04 Thread Shubhangi Garg (JIRA)
Shubhangi Garg created HDFS-4467:


 Summary: Segmentation fault in libhdfs while connecting to HDFS 
and running a Hive Query
 Key: HDFS-4467
 URL: https://issues.apache.org/jira/browse/HDFS-4467
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 1.0.4
 Environment: Ubuntu 12.04, application in C++
Reporter: Shubhangi Garg


Connecting to HDFS using the libhdfs compiled library gives a segmentation 
vault and memory leaks; easily verifiable by valgrind.

Even a simple application program given below has memory leaks:


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Release numbering for branch-2 releases

2013-02-04 Thread Suresh Srinivas
On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy a...@hortonworks.com wrote:


 On Feb 1, 2013, at 2:34 AM, Tom White wrote:
  Whereas Arun is proposing
 
   2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.1.0-alpha, 2.2.0-beta, 2.3.0
 
  and the casual observer might expect there to be a stable 2.0.1 (say)
  on seeing the existence of 2.0.2-alpha.
 
  The first three of these are already released, so I don't think we
  could switch to the Semantic Versioning scheme at this stage. We could
  for release 3 though.
 

 I agree that would have been slightly better, unfortunately it's too late
 now - a new versioning scheme would be even more confusing!

 Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
 stable release? This way we just have one series (2.0.x) which is not
 suitable for general consumption.

 I'm ok either way, but I want to just make a decision and move on to
 making the release asap, appreciate a quick resolution.


+1 for 2.0.3-alpha. 2.0.3-alpha has been the release number that we have
been working on for a while. I am surprised to see the feedback that it is
confusing.

Lets constructively move forward and make a decision and send the release
out quickly. Arun, my suggestion is to call for a release vote.

Regards,
Suresh




-- 
http://hortonworks.com/download/


Re: Release numbering for branch-2 releases

2013-02-04 Thread Stack
On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy a...@hortonworks.com wrote:

 Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
 stable release? This way we just have one series (2.0.x) which is not
 suitable for general consumption.



That contains the versioning damage to the 2.0.x set.  This is an
improvement over the original proposal where we let the versioning mayhem
run out 2.3.

Thanks Arun,
St.Ack


Re: Release numbering for branch-2 releases

2013-02-04 Thread Owen O'Malley
I think that using -(alpha,beta) tags on the release versions is a really
bad idea. All releases should follow the strictly numeric
(Major.Minor.Patch) pattern that we've used for all of the releases except
the 2.0.x ones.

-- Owen


On Mon, Feb 4, 2013 at 11:53 AM, Stack st...@duboce.net wrote:

 On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy a...@hortonworks.com
 wrote:

  Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
  stable release? This way we just have one series (2.0.x) which is not
  suitable for general consumption.
 
 

 That contains the versioning damage to the 2.0.x set.  This is an
 improvement over the original proposal where we let the versioning mayhem
 run out 2.3.

 Thanks Arun,
 St.Ack



[jira] [Created] (HDFS-4468) Fix test failure for HADOOP-9252

2013-02-04 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-4468:


 Summary: Fix test failure for HADOOP-9252
 Key: HDFS-4468
 URL: https://issues.apache.org/jira/browse/HDFS-4468
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor


HADOOP-9252 slightly changes the format of some StringUtils outputs.  It may 
cause test failures.

Also, some methods was deprecated by HADOOP-9252.  The use of them should be 
replaced with the new methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Release numbering for branch-2 releases

2013-02-04 Thread Suresh Srinivas
On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley omal...@apache.org wrote:

 I think that using -(alpha,beta) tags on the release versions is a really
 bad idea.


Why? Can you please share some reasons?

I actually think alpha and beta and stable/GA are much better way to set
the expectation
of the quality of a release. This has been practiced in software release
cycle for a long time.
Having an option to release alpha is good for releasing early and getting
feedback from
people who can try it out and at the same time warning other not so
adventurous users on
quality expectation.

Or do you propose any release that is not marked stable (currently 1.x) is
implicitly alpha/beta?

All releases should follow the strictly numeric
 (Major.Minor.Patch) pattern that we've used for all of the releases except
 the 2.0.x ones.

 -- Owen


 On Mon, Feb 4, 2013 at 11:53 AM, Stack st...@duboce.net wrote:

  On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy a...@hortonworks.com
  wrote:
 
   Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
   stable release? This way we just have one series (2.0.x) which is not
   suitable for general consumption.
  
  
 
  That contains the versioning damage to the 2.0.x set.  This is an
  improvement over the original proposal where we let the versioning mayhem
  run out 2.3.
 
  Thanks Arun,
  St.Ack
 




-- 
http://hortonworks.com/download/


Re: Release numbering for branch-2 releases

2013-02-04 Thread Todd Lipcon
On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas sur...@hortonworks.comwrote:


 Why? Can you please share some reasons?

 I actually think alpha and beta and stable/GA are much better way to set
 the expectation
 of the quality of a release. This has been practiced in software release
 cycle for a long time.
 Having an option to release alpha is good for releasing early and getting
 feedback from
 people who can try it out and at the same time warning other not so
 adventurous users on
 quality expectation.


My issue with the current scheme is that there is little definition as to
what alpha/beta/stable means. We're trying to boil down a complex issue
into a simple tag which doesn't well capture the various subtleties. For
example, different people may variously use the terms to describe:

- Quality/completeness: for example, missing docs, buggy UIs, difficult
setup/install, etc
- Safety: for example, potential bugs which may risk data loss
- Stability: for example, potential bugs which may risk uptime
- End-user API compatibility: will user-facing APIs change in this version?
(affecting those who write MR jobs)
- Framework-developer API compatibility: will YARN-internal APIs change in
this version? (affecting those who write non-MR YARN frameworks)
- Binary compatibility: can I continue to use my application (or YARN)
framework compiled against an old version with this version, without a
recompile?
- Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
- Client-server wire compatibility: can I use old clients to talk to an
upgraded cluster?

Depending on the user's expectations and needs, different factors above may
be significantly more or less important. And different portions of the
software may have different levels of stability in each of the areas. As
I've mentioned in previous threads, my experiences supporting production
Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
while being alpha is significantly less prone to data loss bugs than 1.x
in Hadoop. But, with some of the changes in the proposed 2.0.3-alpha, it
wouldn't be wire-protocol-stable.

How can we best devise a scheme that explains the various factors above in
a more detailed way than one big red warning sticker? What of the above
factors does the community think would be implied by GA?

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Release numbering for branch-2 releases

2013-02-04 Thread Steve Loughran
disclaimer, personal opinions only, I just can't be bothered to subscribe
with @apache.org right now.

On 4 February 2013 14:36, Todd Lipcon t...@cloudera.com wrote:

 - Quality/completeness: for example, missing docs, buggy UIs, difficult
 setup/install, etc


par for the course. Have you ever used Linux?


 - Safety: for example, potential bugs which may risk data loss


Anything that threatens data loss is a blocker, at least for data you care
about.


 - Stability: for example, potential bugs which may risk uptime


Less critical for most people, though it can cost lots of $$.


 - End-user API compatibility: will user-facing APIs change in this version?
 (affecting those who write MR jobs)




 - Framework-developer API compatibility: will YARN-internal APIs change in
 this version? (affecting those who write non-MR YARN frameworks)


Things aren't stable in 2.x there yet, YARN-117 is on my todo list, and
without that I consider it broken. the ASF haven't shipped a non-alpha
version of this -and I don't think anyone else has made any stability
claims either. That includes CDH 4.x, where YARN was a play if you want
feature. Or wide-alpha, as I viewed it.


 - Binary compatibility: can I continue to use my application (or YARN)
 framework compiled against an old version with this version, without a
 recompile?


This is one thing Computer Science has never addressed fully. The whole of
the entire computing stack has to be considered best-effort. If there is
one thing we can do here it is hooking up the entire set of OSS apps to the
nightly build, in a nice DAG including things like Cascading, Spring Data
c, the way Apache Gump did to act as the regression test for Ant (before
Maven broke it)


 - Intra-cluster wire compatibility: can I rolling-upgrade from A to B?


The presence of the 2.0.2 alpha stuff in the field complicates things. I
know you want upgrades, I'm sure others do too, but if that became an
approved version, there's the conflict with the -1 version supported rule
of wire compatibility -does it get changed?


 - Client-server wire compatibility: can I use old clients to talk to an
 upgraded cluster?


IMO we should move clients off the intra-cluster protocol, get them on
WebHDFS, the hcat job APIs, and have a hard split between public and
private. That includes distcp. As webhdfs is in 1.x+ that's the one to care
about.


 Depending on the user's expectations and needs, different factors above may
 be significantly more or less important. And different portions of the
 software may have different levels of stability in each of the areas. As
 I've mentioned in previous threads, my experiences supporting production
 Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
 while being alpha is significantly less prone to data loss bugs than 1.x
 in Hadoop.


I hope you are right -it's where everything is going.


 But, with some of the changes in the proposed 2.0.3-alpha, it
 wouldn't be wire-protocol-stable.


I don't know of anyone who wanted that, anyone who said let's create chaos
and confusion, it was just a consequence of fixing things against an alpha
rlease.


 How can we best devise a scheme that explains the various factors above in
 a more detailed way than one big red warning sticker? What of the above
 factors does the community think would be implied by GA?


Let's see

 $ ant -version
 Apache Ant(TM) version 1.9.0alpha compiled on November 12 2012


Yes, Ant says anything you build locally is an alpha release.

In that context,  it's no different from -SNAPSHOT except it's easier to
field bugreps against, because they are at least replicable; things
downstream can be updated to work with the alpha and test it.

I view beta as the transition to feature complete: bugs and regression
only, with some triage, patches that don't cause visible regressions

Shipping is pretty much bugs only, with serious triage -only the widely
visible things happen after that. Critical integrity and performance merit
new updates.

Security fixes: out of band emergency updates. This is a good reason for
leaving security out of anything: a simpler support model. Unlike Oracle I
don't think security plugins should have side effects other than fix the
security hole.

Maven complicates things as you can't ever undeclare a release there -not
even for security reasons. Its why ops-managed RPM and deb updates are
preferred by ops groups for rolling out new binaries of any form to a pool
of boxes -at the expense of the application having control of its classpath
(ant has some special classpath setup to support OS-based installations,
BTW).

The way I've always viewed alpha and beta tags in apache projects is this:

   - you don't care about regressions of behaviour from features that
   weren't in the previous full release
   - the way you field all bug reports is say is it gone from the latest
   release on that branch? (*)

The big change in Hadoop is the filesystem: nobody want's to lose