Re: [DISCUSS] More Maintenance Releases

2015-06-22 Thread Allen Wittenauer

If 2.6 is the target, someone will have to verify that any 
cherry-picked patches actually work with JDK6 since the PMC voted to officially 
kill backward compatibility in a minor release. It’s going to be easier and 
probably smarter to fix 2.7 if that’s really desired. [1]

Frankly, I’d rather see effort spent on stabilizing trunk and ditching 
the now broken branch-2.  We’re approaching the 4 year anniversary of 0.23.0’s 
release (which later begat 2.x, which is already past the 3 year mark).  It’s 
hard to claim health when its been so long since a branch off of trunk was cut 
and turned into something official.  

[1] Kengo and I are hard at work getting multiJDK testing working in Yetus, but 
it’s not quite ready for prime time. :( It could certain help here, but… it’s 
not very stable yet.

On Jun 22, 2015, at 7:50 AM, Karthik Kambatla ka...@cloudera.com wrote:

 Thanks for starting this thread, Akira.
 
 +1 to more maintenance releases. More stable upstream releases avoids
 duplicating cherry-pick work across consumers/vendors, and shows the
 maturity of the project to users.
 
 I see value in backporting blocker/critical issues, but have mixed feelings
 about doing the same for major/minor/trivial issues. IMO, every commit has
 non-zero potential to introduce other bugs. Depending on the kind of fix
 (say, documentation), it might be okay to include these non-critical fixes.
 One approach could be to allow all bug fixes for 2.x.1, blocker/critical
 for 2.x.2, blocker for 2.x.3 (or something along those lines) to ensure
 increasing stability of maintenance releases?
 
 I am also +1 to any committer picking up RM duties for a maintenance
 release. It is healthy to have more people participate in the release
 process, so long as we have some method to maintenance release madness.
 
 A committer (who is not yet a PMC member) could be a Release Manager, but
 his vote is not binding for the release. I RM-ed the 2.5.x releases as a
 committer. RM-ing a release and voting non-binding could be a good way to
 remind the PMC to include the committer in PMC :)
 
 Cheers
 Karthik
 
 On Mon, Jun 22, 2015 at 4:36 AM, Tsuyoshi Ozawa oz...@apache.org wrote:
 
 Hi Akira,
 
 Thank you for starting interesting topic. +1 on the idea of More
 Maintenance Releases for old branches. It would be good if this
 activity is more coupled with Apache Yetus for users.
 
 BTW, I don't know one of committers, who is not PMC, can be a release
 manager. Does anyone know about this?  It's described in detail as
 follows: http://hadoop.apache.org/bylaws#Decision+Making
 
 Release Manager
 A Release Manager (RM) is a committer who volunteers to produce a
 Release Candidate according to HowToRelease.
 
 Project Management Committee
 Deciding what is distributed as products of the Apache Hadoop project.
 In particular all releases must be approved by the PMC
 
 Thanks,
 - Tsuyoshi
 
 On Mon, Jun 22, 2015 at 6:43 PM, Akira AJISAKA
 ajisa...@oss.nttdata.co.jp wrote:
 Hi everyone,
 
 In Hadoop Summit, I joined HDFS BoF and heard from Jason Lowe that Apache
 Hadoop developers at Yahoo!, Twitter, and other non-distributors work
 very
 hard to maintenance Hadoop by cherry-picking patches to their own
 branches.
 
 I want to share the work with the community. If we can cherry-pick bug
 fix
 patches and have more maintenance releases, it'd be very happy not only
 for
 users but also for developers who work very hard for stabilizing their
 own
 branches.
 
 To have more maintenance releases, I propose two changes:
 
 * Major/Minor/Trivial bug fixes can be cherry-picked
 * (Roughly) Monthly maintenance release
 
 I would like to start the work from branch-2.6. If the change will be
 accepted by the community, I'm willing to work for the maintenance, as a
 release manager.
 
 Best regards,
 Akira
 
 
 
 
 -- 
 Karthik Kambatla
 Software Engineer, Cloudera Inc.
 
 http://five.sentenc.es



Re: [DISCUSS] More Maintenance Releases

2015-06-22 Thread Vinayakumar B
+1 for the idea of maintenance releases.

Considering the amount code changes done in trunk and branch-2,
cherry-picking may not be easy and straight forward in all issues.

I would love to help in cherry-picking the fixes and reviewing them.

I would also love to help in release process.


Regards,
Vinay

On Mon, Jun 22, 2015 at 9:49 PM, Allen Wittenauer a...@altiscale.com wrote:


 If 2.6 is the target, someone will have to verify that any
 cherry-picked patches actually work with JDK6 since the PMC voted to
 officially kill backward compatibility in a minor release. It’s going to be
 easier and probably smarter to fix 2.7 if that’s really desired. [1]

 Frankly, I’d rather see effort spent on stabilizing trunk and
 ditching the now broken branch-2.  We’re approaching the 4 year anniversary
 of 0.23.0’s release (which later begat 2.x, which is already past the 3
 year mark).  It’s hard to claim health when its been so long since a branch
 off of trunk was cut and turned into something official.

 [1] Kengo and I are hard at work getting multiJDK testing working in
 Yetus, but it’s not quite ready for prime time. :( It could certain help
 here, but… it’s not very stable yet.

 On Jun 22, 2015, at 7:50 AM, Karthik Kambatla ka...@cloudera.com wrote:

  Thanks for starting this thread, Akira.
 
  +1 to more maintenance releases. More stable upstream releases avoids
  duplicating cherry-pick work across consumers/vendors, and shows the
  maturity of the project to users.
 
  I see value in backporting blocker/critical issues, but have mixed
 feelings
  about doing the same for major/minor/trivial issues. IMO, every commit
 has
  non-zero potential to introduce other bugs. Depending on the kind of fix
  (say, documentation), it might be okay to include these non-critical
 fixes.
  One approach could be to allow all bug fixes for 2.x.1, blocker/critical
  for 2.x.2, blocker for 2.x.3 (or something along those lines) to ensure
  increasing stability of maintenance releases?
 
  I am also +1 to any committer picking up RM duties for a maintenance
  release. It is healthy to have more people participate in the release
  process, so long as we have some method to maintenance release madness.
 
  A committer (who is not yet a PMC member) could be a Release Manager, but
  his vote is not binding for the release. I RM-ed the 2.5.x releases as a
  committer. RM-ing a release and voting non-binding could be a good way to
  remind the PMC to include the committer in PMC :)
 
  Cheers
  Karthik
 
  On Mon, Jun 22, 2015 at 4:36 AM, Tsuyoshi Ozawa oz...@apache.org
 wrote:
 
  Hi Akira,
 
  Thank you for starting interesting topic. +1 on the idea of More
  Maintenance Releases for old branches. It would be good if this
  activity is more coupled with Apache Yetus for users.
 
  BTW, I don't know one of committers, who is not PMC, can be a release
  manager. Does anyone know about this?  It's described in detail as
  follows: http://hadoop.apache.org/bylaws#Decision+Making
 
  Release Manager
  A Release Manager (RM) is a committer who volunteers to produce a
  Release Candidate according to HowToRelease.
 
  Project Management Committee
  Deciding what is distributed as products of the Apache Hadoop project.
  In particular all releases must be approved by the PMC
 
  Thanks,
  - Tsuyoshi
 
  On Mon, Jun 22, 2015 at 6:43 PM, Akira AJISAKA
  ajisa...@oss.nttdata.co.jp wrote:
  Hi everyone,
 
  In Hadoop Summit, I joined HDFS BoF and heard from Jason Lowe that
 Apache
  Hadoop developers at Yahoo!, Twitter, and other non-distributors work
  very
  hard to maintenance Hadoop by cherry-picking patches to their own
  branches.
 
  I want to share the work with the community. If we can cherry-pick bug
  fix
  patches and have more maintenance releases, it'd be very happy not only
  for
  users but also for developers who work very hard for stabilizing their
  own
  branches.
 
  To have more maintenance releases, I propose two changes:
 
  * Major/Minor/Trivial bug fixes can be cherry-picked
  * (Roughly) Monthly maintenance release
 
  I would like to start the work from branch-2.6. If the change will be
  accepted by the community, I'm willing to work for the maintenance, as
 a
  release manager.
 
  Best regards,
  Akira
 
 
 
 
  --
  Karthik Kambatla
  Software Engineer, Cloudera Inc.
  
  http://five.sentenc.es




Re: [DISCUSS] More Maintenance Releases

2015-06-22 Thread Karthik Kambatla
Thanks for starting this thread, Akira.

+1 to more maintenance releases. More stable upstream releases avoids
duplicating cherry-pick work across consumers/vendors, and shows the
maturity of the project to users.

I see value in backporting blocker/critical issues, but have mixed feelings
about doing the same for major/minor/trivial issues. IMO, every commit has
non-zero potential to introduce other bugs. Depending on the kind of fix
(say, documentation), it might be okay to include these non-critical fixes.
One approach could be to allow all bug fixes for 2.x.1, blocker/critical
for 2.x.2, blocker for 2.x.3 (or something along those lines) to ensure
increasing stability of maintenance releases?

I am also +1 to any committer picking up RM duties for a maintenance
release. It is healthy to have more people participate in the release
process, so long as we have some method to maintenance release madness.

A committer (who is not yet a PMC member) could be a Release Manager, but
his vote is not binding for the release. I RM-ed the 2.5.x releases as a
committer. RM-ing a release and voting non-binding could be a good way to
remind the PMC to include the committer in PMC :)

Cheers
Karthik

On Mon, Jun 22, 2015 at 4:36 AM, Tsuyoshi Ozawa oz...@apache.org wrote:

 Hi Akira,

 Thank you for starting interesting topic. +1 on the idea of More
 Maintenance Releases for old branches. It would be good if this
 activity is more coupled with Apache Yetus for users.

 BTW, I don't know one of committers, who is not PMC, can be a release
 manager. Does anyone know about this?  It's described in detail as
 follows: http://hadoop.apache.org/bylaws#Decision+Making

  Release Manager
  A Release Manager (RM) is a committer who volunteers to produce a
 Release Candidate according to HowToRelease.
 
  Project Management Committee
  Deciding what is distributed as products of the Apache Hadoop project.
 In particular all releases must be approved by the PMC

 Thanks,
 - Tsuyoshi

 On Mon, Jun 22, 2015 at 6:43 PM, Akira AJISAKA
 ajisa...@oss.nttdata.co.jp wrote:
  Hi everyone,
 
  In Hadoop Summit, I joined HDFS BoF and heard from Jason Lowe that Apache
  Hadoop developers at Yahoo!, Twitter, and other non-distributors work
 very
  hard to maintenance Hadoop by cherry-picking patches to their own
 branches.
 
  I want to share the work with the community. If we can cherry-pick bug
 fix
  patches and have more maintenance releases, it'd be very happy not only
 for
  users but also for developers who work very hard for stabilizing their
 own
  branches.
 
  To have more maintenance releases, I propose two changes:
 
  * Major/Minor/Trivial bug fixes can be cherry-picked
  * (Roughly) Monthly maintenance release
 
  I would like to start the work from branch-2.6. If the change will be
  accepted by the community, I'm willing to work for the maintenance, as a
  release manager.
 
  Best regards,
  Akira




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-22 Thread Nick Dimiduk
On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe cmcc...@apache.org
wrote:

 You mentioned that most of our project will be focused on shell
 scripts I guess based on the existing test-patch code.  Allen did a
 lot of good work in this area recently.  I am curious if you evaluated
 languages such as Python or Node.js for this use-case.  Shell scripts
 can get a little... tricky beyond a certain size.  On the other hand,
 if we are standardizing on shell, which shell and which version?
 Perhaps bash 3.5+?


I'll also add that shell is not helpful for a cross-platform set of
tooling. I recently added a daemon to Apache Phoenix; an explicit
requirement was Windows support. I ended up implementing a solution in
python because that environment is platform-agnostic and still systems-y
enough. I think this is something this project should seriously consider.

-n

On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey bus...@cloudera.com wrote:
  I'm going to try responding to several things at once here, so apologies
 if
  I miss anyone and sorry for the long email. :)
 
 
  On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran ste...@hortonworks.com
  wrote:
 
  I think it's good to have a general build/test process projects can
 share,
  so +1 to pulling it out. You should get help from others.
 
  regarding incubation, it is a lot of work, especially for something
 that's
  more of an in-house tool than an artifact to release and redistribute.
 
  You can't just use apache labs or the build project's repo to work on
 this?
 
  if you do want to incubate, we may want to nominate the hadoop project
 as
  the monitoring PMC, rather than incubator@.
 
  -steve
 
 
  Important note: we're proposing a board resolution that would directly
 pull
  this code base out into a new TLP; there'd be no incubator, we'd just
  continue building community and start making releases.
 
  The proposed PMC believes the tooling we're talking about has direct
  applicability to projects well outside of the ASF. Lot's of other open
  source projects run on community contributions and have a general need
 for
  better QA tools. Given that problem set and the presence of a community
  working to solve it, there's no reason this needs to be treated as an
  in-house build project. We certainly want to be useful to ASF projects
 and
  getting them on-board given our current optimization for ASF infra will
  certainly be easier, but we're not limited to that (and our current
  prerequisites, a CI tool and jira or github, are pretty broadly
 available).
 
 
  On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk ndimi...@apache.org
 wrote:
 
 
  Since we're tossing out names, how about Apache Bootstrap? It's a
  meta-project to help other projects get off the ground, after all.
 
 
 
  There's already a web development framework named Bootstrap[1]. It's also
  used by several ASF projects, so I think it best to avoid the confusion.
 
  The name is, of course, up to the proposed PMC. As a bit of background,
 the
  current name Yetus fulfills Allen's desire to have something shell
 related
  and my desire to have a project that starts with Y (there are currently
 no
  ASF projects that start with Y). The universe of names that fill in these
  two is very small, AFAICT. I did a brief suitability search and didn't
 find
  any blockers.
 
 
   On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer a...@altiscale.com
   wrote:
 
 
  Since a couple of people have brought it up:
 
  I think the release question is probably one of the big question
  marks.  Other than tar balls, how does something like this actually get
  used downstream?
 
  For test-patch, in particular, I have a few thoughts on this:
 
  Short term:
 
  * Projects that want to move RIGHT NOW would modify their
 Jenkins
  jobs to checkout from the Yetus repo (preferably at a well known tag or
  branch) in one directory and their project repo in another directory.
 Then
  it’s just a matter of passing the correct flags to test-patch.  This is
  pretty much how I’ve been personally running test-patch for about 6
 months
  now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.
 
  * Create a stub version of test-patch that projects could check
  into their repo, replacing the existing test-patch.  This stub version
  would git clone from either ASF or github and then execute test-patch
  accordingly on demand.  With the correct smarts, it could make sure it
 has
  a cached version to prevent continual clones.
 
  Longer term:
 
  * I’ve been toying with the idea of (ab)using Java repos and
  packaging as a transportation layer, either in addition or in
 combination
  with something like a maven plugin.  Something like this would clearly
 be
  better for offline usage and/or to lower the network traffic.
 
 
  It's important that the project follow ASF guidelines on publishing
  releases[2]. So long as we publish releases to the distribution
 directory 

Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-22 Thread Andrew Purtell
On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe cmcc...@apache.org
 wrote:

  You mentioned that most of our project will be focused on shell
  scripts I guess based on the existing test-patch code.  Allen did a
  lot of good work in this area recently.  I am curious if you evaluated
  languages such as Python or Node.js for this use-case.  Shell scripts
  can get a little... tricky beyond a certain size.  On the other hand,
  if we are standardizing on shell, which shell and which version?
  Perhaps bash 3.5+?
 

 I'll also add that shell is not helpful for a cross-platform set of
 tooling. I recently added a daemon to Apache Phoenix; an explicit
 requirement was Windows support. I ended up implementing a solution in
 python because that environment is platform-agnostic and still systems-y
 enough. I think this is something this project should seriously consider.


In my opinion, historically, test-patch hasn't needed to be cross platform
because the only first class development environment for Hadoop has been
Linux. Growing beyond this could absolutely be one focus of Yetus should
that be a consensus goal of the community. The seed of the project, though,
is today's test-patch, which is implemented in bash. That's where we are
today. Language discussions (smile) can and should be forward looking.


On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe cmcc...@apache.org
 wrote:

  You mentioned that most of our project will be focused on shell
  scripts I guess based on the existing test-patch code.  Allen did a
  lot of good work in this area recently.  I am curious if you evaluated
  languages such as Python or Node.js for this use-case.  Shell scripts
  can get a little... tricky beyond a certain size.  On the other hand,
  if we are standardizing on shell, which shell and which version?
  Perhaps bash 3.5+?
 

 I'll also add that shell is not helpful for a cross-platform set of
 tooling. I recently added a daemon to Apache Phoenix; an explicit
 requirement was Windows support. I ended up implementing a solution in
 python because that environment is platform-agnostic and still systems-y
 enough. I think this is something this project should seriously consider.

 -n

 On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey bus...@cloudera.com wrote:
   I'm going to try responding to several things at once here, so
 apologies
  if
   I miss anyone and sorry for the long email. :)
  
  
   On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran 
 ste...@hortonworks.com
   wrote:
  
   I think it's good to have a general build/test process projects can
  share,
   so +1 to pulling it out. You should get help from others.
  
   regarding incubation, it is a lot of work, especially for something
  that's
   more of an in-house tool than an artifact to release and redistribute.
  
   You can't just use apache labs or the build project's repo to work on
  this?
  
   if you do want to incubate, we may want to nominate the hadoop project
  as
   the monitoring PMC, rather than incubator@.
  
   -steve
  
  
   Important note: we're proposing a board resolution that would directly
  pull
   this code base out into a new TLP; there'd be no incubator, we'd just
   continue building community and start making releases.
  
   The proposed PMC believes the tooling we're talking about has direct
   applicability to projects well outside of the ASF. Lot's of other open
   source projects run on community contributions and have a general need
  for
   better QA tools. Given that problem set and the presence of a community
   working to solve it, there's no reason this needs to be treated as an
   in-house build project. We certainly want to be useful to ASF projects
  and
   getting them on-board given our current optimization for ASF infra will
   certainly be easier, but we're not limited to that (and our current
   prerequisites, a CI tool and jira or github, are pretty broadly
  available).
  
  
   On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk ndimi...@apache.org
  wrote:
  
  
   Since we're tossing out names, how about Apache Bootstrap? It's a
   meta-project to help other projects get off the ground, after all.
  
  
  
   There's already a web development framework named Bootstrap[1]. It's
 also
   used by several ASF projects, so I think it best to avoid the
 confusion.
  
   The name is, of course, up to the proposed PMC. As a bit of background,
  the
   current name Yetus fulfills Allen's desire to have something shell
  related
   and my desire to have a project that starts with Y (there are currently
  no
   ASF projects that start with Y). The universe of names that fill in
 these
   two is very small, AFAICT. I did a brief suitability search and didn't
  find
   any blockers.
  
  
On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer a...@altiscale.com
wrote:
  
  
   

[jira] [Created] (HADOOP-12109) Distcp of file 5GB to swift fails with HTTP 413 error

2015-06-22 Thread Phil D'Amore (JIRA)
Phil D'Amore created HADOOP-12109:
-

 Summary: Distcp of file  5GB to swift fails with HTTP 413 error
 Key: HADOOP-12109
 URL: https://issues.apache.org/jira/browse/HADOOP-12109
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/swift
Affects Versions: 2.6.0
Reporter: Phil D'Amore


Trying to use distcp to copy a file more than 5GB to swift fs results in a 
stack like the following:

15/06/01 20:58:57 ERROR util.RetriableCommand: Failure in Retriable command: 
Copying hdfs://xxx:8020/path/to/random-5Gplus.dat to swift://xxx/5Gplus.dat
Invalid Response: Method COPY on 
http://xxx:8080/v1/AUTH_fb7a8901dd8d4c8dba27f5e5d55a46a9/test/.distcp.tmp.attempt_local1097967418_0001_m_00_0
 failed, status code: 413, status line: HTTP/1.1 413 Request Entity Too Large  
COPY 
http://xxx:8080/v1/AUTH_fb7a8901dd8d4c8dba27f5e5d55a46a9/test/.distcp.tmp.attempt_local1097967418_0001_m_00_0
 = 413 : htmlh1Request Entity Too Large/h1pThe body of your request 
was too large for this server./p/html
at 
org.apache.hadoop.fs.swift.http.SwiftRestClient.buildException(SwiftRestClient.java:1502)
at 
org.apache.hadoop.fs.swift.http.SwiftRestClient.perform(SwiftRestClient.java:1403)
at 
org.apache.hadoop.fs.swift.http.SwiftRestClient.copyObject(SwiftRestClient.java:923)
at 
org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.copyObject(SwiftNativeFileSystemStore.java:765)
at 
org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.rename(SwiftNativeFileSystemStore.java:617)
at 
org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.rename(SwiftNativeFileSystem.java:577)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.promoteTmpToTarget(RetriableFileCopyCommand.java:220)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:137)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

It looks like the problem actually occurs in the rename operation which happens 
after the copy.  The rename is implemented as a copy/delete, and this secondary 
copy looks like it's not done in a way that breaks up the file into smaller 
chunks.  

It looks like the following bug:

https://bugs.launchpad.net/sahara/+bug/1428941

It does not look like the fix for this is incorporated into hadoop's swift 
client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-12110) Consolidate usage of JSON libraries

2015-06-22 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved HADOOP-12110.

Resolution: Invalid

Opened for the wrong project.  Sorry, close as invalid.

 Consolidate usage of JSON libraries
 ---

 Key: HADOOP-12110
 URL: https://issues.apache.org/jira/browse/HADOOP-12110
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Eric Yang
Assignee: Eric Yang

 Chukwa uses JSON jar from json.org and also json-simple from googlecode.  It 
 would be nice if we only use one implementation of JSON to be consistent.  
 Mindev JSON-smart was also considered as replacement for JSON simple to 
 improve performance, but it doesn't handle some characters correctly.  
 Therefore, it's best to use JSON Simple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12110) Consolidate usage of JSON libraries

2015-06-22 Thread Eric Yang (JIRA)
Eric Yang created HADOOP-12110:
--

 Summary: Consolidate usage of JSON libraries
 Key: HADOOP-12110
 URL: https://issues.apache.org/jira/browse/HADOOP-12110
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Eric Yang
Assignee: Eric Yang


Chukwa uses JSON jar from json.org and also json-simple from googlecode.  It 
would be nice if we only use one implementation of JSON to be consistent.  
Mindev JSON-smart was also considered as replacement for JSON simple to improve 
performance, but it doesn't handle some characters correctly.  Therefore, it's 
best to use JSON Simple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] More Maintenance Releases

2015-06-22 Thread Tsuyoshi Ozawa
Hi Akira,

Thank you for starting interesting topic. +1 on the idea of More
Maintenance Releases for old branches. It would be good if this
activity is more coupled with Apache Yetus for users.

BTW, I don't know one of committers, who is not PMC, can be a release
manager. Does anyone know about this?  It's described in detail as
follows: http://hadoop.apache.org/bylaws#Decision+Making

 Release Manager
 A Release Manager (RM) is a committer who volunteers to produce a Release 
 Candidate according to HowToRelease.

 Project Management Committee
 Deciding what is distributed as products of the Apache Hadoop project. In 
 particular all releases must be approved by the PMC

Thanks,
- Tsuyoshi

On Mon, Jun 22, 2015 at 6:43 PM, Akira AJISAKA
ajisa...@oss.nttdata.co.jp wrote:
 Hi everyone,

 In Hadoop Summit, I joined HDFS BoF and heard from Jason Lowe that Apache
 Hadoop developers at Yahoo!, Twitter, and other non-distributors work very
 hard to maintenance Hadoop by cherry-picking patches to their own branches.

 I want to share the work with the community. If we can cherry-pick bug fix
 patches and have more maintenance releases, it'd be very happy not only for
 users but also for developers who work very hard for stabilizing their own
 branches.

 To have more maintenance releases, I propose two changes:

 * Major/Minor/Trivial bug fixes can be cherry-picked
 * (Roughly) Monthly maintenance release

 I would like to start the work from branch-2.6. If the change will be
 accepted by the community, I'm willing to work for the maintenance, as a
 release manager.

 Best regards,
 Akira


[DISCUSS] More Maintenance Releases

2015-06-22 Thread Akira AJISAKA

Hi everyone,

In Hadoop Summit, I joined HDFS BoF and heard from Jason Lowe that 
Apache Hadoop developers at Yahoo!, Twitter, and other non-distributors 
work very hard to maintenance Hadoop by cherry-picking patches to their 
own branches.


I want to share the work with the community. If we can cherry-pick bug 
fix patches and have more maintenance releases, it'd be very happy not 
only for users but also for developers who work very hard for 
stabilizing their own branches.


To have more maintenance releases, I propose two changes:

* Major/Minor/Trivial bug fixes can be cherry-picked
* (Roughly) Monthly maintenance release

I would like to start the work from branch-2.6. If the change will be 
accepted by the community, I'm willing to work for the maintenance, as a 
release manager.


Best regards,
Akira


[jira] [Resolved] (HADOOP-12108) Erroneous behavior of use of wildcard character ( * ) in ls command of hdfs

2015-06-22 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved HADOOP-12108.
---
Resolution: Invalid

Thanks Aman! Steve is right. You do need to use quotes when there is already a 
file on the local file system which would match the wildcard

 Erroneous behavior of use of wildcard character ( * ) in ls command of hdfs 
 

 Key: HADOOP-12108
 URL: https://issues.apache.org/jira/browse/HADOOP-12108
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Aman Goyal
Priority: Critical

 If you have following directories in your LOCAL file system 
 /data/hadoop/sample/00/contents1.txt
 /data/hadoop/sample/01/contents2.txt
 and following directories in hdfs : 
 /data/hadoop/sample/00/contents1.txt
 /data/hadoop/sample/01/contents2.txt
 /data/hadoop/sample/02/contents3.txt
 suppose you run the following hdfs ls command:
 hdfs dfs -ls -R /data/hadoop/sample/*
 the paths that are printed have a reference to local paths, and only 00  01 
 directories get listed. 
 this happens only when wildcard ( * ) character is used in input paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-22 Thread Colin P. McCabe
+1 for making this a separate project.  We've always struggled with a
lot of forks of the test-patch code and perhaps this project can help
create something that works well for multiple projects.

Bypassing the incubator seems kind of weird (I didn't know that was an
option) but I will let other people with more experience in the ASF
comment on that.

You mentioned that most of our project will be focused on shell
scripts I guess based on the existing test-patch code.  Allen did a
lot of good work in this area recently.  I am curious if you evaluated
languages such as Python or Node.js for this use-case.  Shell scripts
can get a little... tricky beyond a certain size.  On the other hand,
if we are standardizing on shell, which shell and which version?
Perhaps bash 3.5+?

Also, what will be the mechanism for customizing this for each
project?  Ideally the customizations needed would be small so we could
share the most code.

cheers,
Colin


On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey bus...@cloudera.com wrote:
 I'm going to try responding to several things at once here, so apologies if
 I miss anyone and sorry for the long email. :)


 On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran ste...@hortonworks.com
 wrote:

 I think it's good to have a general build/test process projects can share,
 so +1 to pulling it out. You should get help from others.

 regarding incubation, it is a lot of work, especially for something that's
 more of an in-house tool than an artifact to release and redistribute.

 You can't just use apache labs or the build project's repo to work on this?

 if you do want to incubate, we may want to nominate the hadoop project as
 the monitoring PMC, rather than incubator@.

 -steve


 Important note: we're proposing a board resolution that would directly pull
 this code base out into a new TLP; there'd be no incubator, we'd just
 continue building community and start making releases.

 The proposed PMC believes the tooling we're talking about has direct
 applicability to projects well outside of the ASF. Lot's of other open
 source projects run on community contributions and have a general need for
 better QA tools. Given that problem set and the presence of a community
 working to solve it, there's no reason this needs to be treated as an
 in-house build project. We certainly want to be useful to ASF projects and
 getting them on-board given our current optimization for ASF infra will
 certainly be easier, but we're not limited to that (and our current
 prerequisites, a CI tool and jira or github, are pretty broadly available).


 On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk ndimi...@apache.org wrote:


 Since we're tossing out names, how about Apache Bootstrap? It's a
 meta-project to help other projects get off the ground, after all.



 There's already a web development framework named Bootstrap[1]. It's also
 used by several ASF projects, so I think it best to avoid the confusion.

 The name is, of course, up to the proposed PMC. As a bit of background, the
 current name Yetus fulfills Allen's desire to have something shell related
 and my desire to have a project that starts with Y (there are currently no
 ASF projects that start with Y). The universe of names that fill in these
 two is very small, AFAICT. I did a brief suitability search and didn't find
 any blockers.


  On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer a...@altiscale.com
  wrote:


 Since a couple of people have brought it up:

 I think the release question is probably one of the big question
 marks.  Other than tar balls, how does something like this actually get
 used downstream?

 For test-patch, in particular, I have a few thoughts on this:

 Short term:

 * Projects that want to move RIGHT NOW would modify their Jenkins
 jobs to checkout from the Yetus repo (preferably at a well known tag or
 branch) in one directory and their project repo in another directory.  Then
 it’s just a matter of passing the correct flags to test-patch.  This is
 pretty much how I’ve been personally running test-patch for about 6 months
 now. Under Jenkins, we’ve seen this work with NiFi (incubating) already.

 * Create a stub version of test-patch that projects could check
 into their repo, replacing the existing test-patch.  This stub version
 would git clone from either ASF or github and then execute test-patch
 accordingly on demand.  With the correct smarts, it could make sure it has
 a cached version to prevent continual clones.

 Longer term:

 * I’ve been toying with the idea of (ab)using Java repos and
 packaging as a transportation layer, either in addition or in combination
 with something like a maven plugin.  Something like this would clearly be
 better for offline usage and/or to lower the network traffic.


 It's important that the project follow ASF guidelines on publishing
 releases[2]. So long as we publish releases to the distribution directory I
 think we'd be fine having 

Re: [DISCUSS] More Maintenance Releases

2015-06-22 Thread Colin P. McCabe
+1 for creating a maintenance release with a more rapid release
cadence and more effort put into stability backports.  I think this
would really be great for the project.

Colin

On Mon, Jun 22, 2015 at 2:43 AM, Akira AJISAKA
ajisa...@oss.nttdata.co.jp wrote:
 Hi everyone,

 In Hadoop Summit, I joined HDFS BoF and heard from Jason Lowe that Apache
 Hadoop developers at Yahoo!, Twitter, and other non-distributors work very
 hard to maintenance Hadoop by cherry-picking patches to their own branches.

 I want to share the work with the community. If we can cherry-pick bug fix
 patches and have more maintenance releases, it'd be very happy not only for
 users but also for developers who work very hard for stabilizing their own
 branches.

 To have more maintenance releases, I propose two changes:

 * Major/Minor/Trivial bug fixes can be cherry-picked
 * (Roughly) Monthly maintenance release

 I would like to start the work from branch-2.6. If the change will be
 accepted by the community, I'm willing to work for the maintenance, as a
 release manager.

 Best regards,
 Akira


[jira] [Created] (HADOOP-12111) Split test-patch off into its own TLP

2015-06-22 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-12111:
-

 Summary: Split test-patch off into its own TLP
 Key: HADOOP-12111
 URL: https://issues.apache.org/jira/browse/HADOOP-12111
 Project: Hadoop Common
  Issue Type: Bug
  Components: yetus
Reporter: Allen Wittenauer


Given test-patch's tendency to get forked into a variety of different projects, 
it makes a lot of sense to make an Apache TLP so that everyone can benefit from 
a common code base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12108) Erroneous behavior of use of wildcard character ( * ) in ls command of hdfs

2015-06-22 Thread Aman Goyal (JIRA)
Aman Goyal created HADOOP-12108:
---

 Summary: Erroneous behavior of use of wildcard character ( * ) in 
ls command of hdfs 
 Key: HADOOP-12108
 URL: https://issues.apache.org/jira/browse/HADOOP-12108
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Aman Goyal
Priority: Critical


If you have following directories in your LOCAL file system 
/data/hadoop/sample/00/contents1.txt
/data/hadoop/sample/01/contents2.txt

and following directories in hdfs : 
/data/hadoop/sample/00/contents1.txt
/data/hadoop/sample/01/contents2.txt
/data/hadoop/sample/02/contents3.txt

suppose you run the following hdfs ls command:
hdfs dfs -ls -R /data/hadoop/sample/*

the paths that are printed have a reference to local paths, and only 00  01 
directories get listed. 

this happens only when wildcard (*) character is used in input paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] More Maintenance Releases

2015-06-22 Thread Sean Busbey
More maintenance releases would be excellent.


If y'all are going to make more releases on the 2.6 line, please consider
backporting HADOOP-11710 as without it HBase is unusable on top of HDFS
encryption. It's been inconvenient that the fix is only available in a
non-production release line.

-Sean

On Mon, Jun 22, 2015 at 6:36 AM, Tsuyoshi Ozawa oz...@apache.org wrote:

 Hi Akira,

 Thank you for starting interesting topic. +1 on the idea of More
 Maintenance Releases for old branches. It would be good if this
 activity is more coupled with Apache Yetus for users.

 BTW, I don't know one of committers, who is not PMC, can be a release
 manager. Does anyone know about this?  It's described in detail as
 follows: http://hadoop.apache.org/bylaws#Decision+Making

  Release Manager
  A Release Manager (RM) is a committer who volunteers to produce a
 Release Candidate according to HowToRelease.
 
  Project Management Committee
  Deciding what is distributed as products of the Apache Hadoop project.
 In particular all releases must be approved by the PMC

 Thanks,
 - Tsuyoshi

 On Mon, Jun 22, 2015 at 6:43 PM, Akira AJISAKA
 ajisa...@oss.nttdata.co.jp wrote:
  Hi everyone,
 
  In Hadoop Summit, I joined HDFS BoF and heard from Jason Lowe that Apache
  Hadoop developers at Yahoo!, Twitter, and other non-distributors work
 very
  hard to maintenance Hadoop by cherry-picking patches to their own
 branches.
 
  I want to share the work with the community. If we can cherry-pick bug
 fix
  patches and have more maintenance releases, it'd be very happy not only
 for
  users but also for developers who work very hard for stabilizing their
 own
  branches.
 
  To have more maintenance releases, I propose two changes:
 
  * Major/Minor/Trivial bug fixes can be cherry-picked
  * (Roughly) Monthly maintenance release
 
  I would like to start the work from branch-2.6. If the change will be
  accepted by the community, I'm willing to work for the maintenance, as a
  release manager.
 
  Best regards,
  Akira




-- 
Sean