Re: Branch merges and 3.0.0-beta1 scope

2017-08-22 Thread Allen Wittenauer
We should avoid turning this into a replay of Apache Hadoop 2.6.0 (and 
to a lesser degree, 2.7.0 and 2.8.0) where a bunch of last minute 
“experimental” features derail stability for a significantly long period of 
time.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Map reduce sample program

2017-08-22 Thread Daniel Templeton

On 8/19/17 3:28 AM, Remil Mohanan wrote:

I am trying to pass multiple non key values from mapper to reducer.


The only way to pass data from the mapper to the reducer is through 
passing key-values.  One common trick is to designate a special key as 
the out-of-band information key and then use a custom sorting comparator 
to make sure that key comes first in the sort order.  I'm sure you can 
find examples online.



Similarly for reading and writing a file inside the hdfs system other than 
normal read and write.



I don't understand.  Reading and writing a file in HDFS from an MR task 
works exactly the same as doing it from a stand-alone program. You 
probably want to do it in the setup() method, though.


Daniel

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Branch merges and 3.0.0-beta1 scope

2017-08-22 Thread Ray Chiang

On 8/22/17 3:20 AM, Steve Loughran wrote:


On 21 Aug 2017, at 22:22, Vinod Kumar Vavilapalli  wrote:

Steve,

You can be strict & ruthless about the timelines. Anything that doesn’t get in 
by mid-September, as was originally planned, can move to the next release - whether 
it is feature work on branches or feature work on trunk.

The problem I see here is that code & branches being worked on for a year are 
now (apparently) close to being done and we are telling them to hold for 7 more 
months - this is not a reasonable ask..

If you are advocating for a 3.1 plan, I’m sure one of these branch ‘owners’ can 
volunteer. But this is how you get competing releases and split bandwidth.

As for compatibility / testing etc, it seems like there is a belief that the 
current ‘scoped’ features are all tested well in these areas and so adding more 
is going to hurt the release. There is no way this is the reality, trunk has so 
many features that have been landing for years, the only way we can 
collectively attempt towards making this stable is by getting as many parties 
together as possible, each verifying stuff that they need. Not by excluding 
specific features.


If everyone is confident & its coming together, it does make sense. I think 
those of us (myself included) who are merging stuff in do have to recognise that we 
really need to follow it through by being responsive to any problem -and with the 
release manager having the right to pull things out if its felt to be significantly 
threatening the stability of the final 3.0 release.

I think we should also consider making the 3.0 beta the feature freeze; after that fixes 
on the features go in, but nothing else of significance, otherwise the value of the beta 
"test this code more broadly" is diminoshed
At this point, there have been three planned alphas from September 2016 
until July 2017 to "get in features".  While a couple of upcoming 
features are "a few weeks" away, I think all of us are aware how 
predictable software development schedules can be.  I think we can also 
all agree that rushing just to meet a release deadline isn't the best 
practice when it comes to software development either.


Andrew has been very clear about his goals at each step and I think 
Wangda's willingness to not rush in resource types was an appropriate 
response.  I'm sympathetic to the goals of getting in a feature for 3.0, 
but it might be a good idea for each project that is a "few weeks away" 
to seriously look at the readiness compared to the features which have 
been testing for 6+ months already.


-Ray


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge feature branch YARN-5355 (Timeline Service v2) to trunk

2017-08-22 Thread Vinod Kumar Vavilapalli
Such a great community effort - hats off, team!

Thanks
+Vinod

> On Aug 21, 2017, at 11:32 PM, Vrushali Channapattan  
> wrote:
> 
> Hi folks,
> 
> Per earlier discussion [1], I'd like to start a formal vote to merge
> feature branch YARN-5355 [2] (Timeline Service v.2) to trunk. The vote will
> run for 7 days, and will end August 29 11:00 PM PDT.
> 
> We have previously completed one merge onto trunk [3] and Timeline Service
> v2 has been part of Hadoop release 3.0.0-alpha1.
> 
> Since then, we have been working on extending the capabilities of Timeline
> Service v2 in a feature branch [2] for a while, and we are reasonably
> confident that the state of the feature meets the criteria to be merged
> onto trunk and we'd love folks to get their hands on it in a test capacity
> and provide valuable feedback so that we can make it production-ready.
> 
> In a nutshell, Timeline Service v.2 delivers significant scalability and
> usability improvements based on a new architecture. What we would like to
> merge to trunk is termed "alpha 2" (milestone 2). The feature has a
> complete end-to-end read/write flow with security and read level
> authorization via whitelists. You should be able to start setting it up and
> testing it.
> 
> At a high level, the following are the key features that have been
> implemented since alpha1:
> - Security via Kerberos Authentication and delegation tokens
> - Read side simple authorization via whitelist
> - Client configurable entity sort ordering
> - Richer REST APIs for apps, app attempts, containers, fetching metrics by
> timerange, pagination, sub-app entities
> - Support for storing sub-application entities (entities that exist outside
> the scope of an application)
> - Configurable TTLs (time-to-live) for tables, configurable table prefixes,
> configurable hbase cluster
> - Flow level aggregations done as dynamic (table level) coprocessors
> - Uses latest stable HBase release 1.2.6
> 
> There are a total of 82 subtasks that were completed as part of this effort.
> 
> We paid close attention to ensure that once disabled Timeline Service v.2
> does not impact existing functionality when disabled (by default).
> 
> Special thanks to a team of folks who worked hard and contributed towards
> this effort with patches, reviews and guidance: Rohith Sharma K S, Varun
> Saxena, Haibo Chen, Sangjin Lee, Li Lu, Vinod Kumar Vavilapalli, Joep
> Rottinghuis, Jason Lowe, Jian He, Robert Kanter, Micheal Stack.
> 
> Regards,
> Vrushali
> 
> [1] http://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27383.html
> [2] https://issues.apache.org/jira/browse/YARN-5355
> [3] https://issues.apache.org/jira/browse/YARN-2928
> [4] https://github.com/apache/hadoop/commits/YARN-5355


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-08-22 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/

[Aug 21, 2017 6:08:38 PM] (manojpec) HDFS-11988. Verify HDFS Snapshots with 
open files captured are
[Aug 21, 2017 6:48:51 PM] (arp) HDFS-12325. SFTPFileSystem operations should 
restore cwd. Contributed by
[Aug 21, 2017 8:45:30 PM] (jzhuge) HDFS-11738. Hedged pread takes more time 
when block moved from initial
[Aug 22, 2017 5:43:08 AM] (Arun Suresh) YARN-5603. Metrics for Federation 
StateStore. (Ellen Hui via asuresh)
[Aug 22, 2017 5:50:24 AM] (Arun Suresh) YARN-6923. Metrics for Federation 
Router. (Giovanni Matteo Fumarola via




-1 overall


The following subsystems voted -1:
findbugs unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
   Hard coded reference to an absolute pathname in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext)
 At DockerLinuxContainerRuntime.java:absolute pathname in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext)
 At DockerLinuxContainerRuntime.java:[line 490] 

Failed junit tests :

   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration 
   hadoop.hdfs.TestReconstructStripedFile 
   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency 
   hadoop.hdfs.server.namenode.TestDecommissioningStatus 
   hadoop.hdfs.TestPread 
   hadoop.hdfs.server.datanode.TestDataNodeUUID 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation 
   hadoop.yarn.sls.appmaster.TestAMSimulator 
   hadoop.yarn.sls.nodemanager.TestNMSimulator 

Timed out junit tests :

   org.apache.hadoop.hdfs.TestLeaseRecovery2 
   
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-compile-javac-root.txt
  [296K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/whitespace-eol.txt
  [11M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/whitespace-tabs.txt
  [1.2M]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-javadoc-javadoc-root.txt
  [1.9M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [672K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [64K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt
  [16K]

Powered by Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Re: Branch merges and 3.0.0-beta1 scope

2017-08-22 Thread Steve Loughran

> On 21 Aug 2017, at 22:22, Vinod Kumar Vavilapalli  wrote:
> 
> Steve,
> 
> You can be strict & ruthless about the timelines. Anything that doesn’t get 
> in by mid-September, as was originally planned, can move to the next release 
> - whether it is feature work on branches or feature work on trunk.
> 
> The problem I see here is that code & branches being worked on for a year are 
> now (apparently) close to being done and we are telling them to hold for 7 
> more months - this is not a reasonable ask..
> 
> If you are advocating for a 3.1 plan, I’m sure one of these branch ‘owners’ 
> can volunteer. But this is how you get competing releases and split bandwidth.
> 
> As for compatibility / testing etc, it seems like there is a belief that the 
> current ‘scoped’ features are all tested well in these areas and so adding 
> more is going to hurt the release. There is no way this is the reality, trunk 
> has so many features that have been landing for years, the only way we can 
> collectively attempt towards making this stable is by getting as many parties 
> together as possible, each verifying stuff that they need. Not by excluding 
> specific features.
> 

If everyone is confident & its coming together, it does make sense. I think 
those of us (myself included) who are merging stuff in do have to recognise 
that we really need to follow it through by being responsive to any problem 
-and with the release manager having the right to pull things out if its felt 
to be significantly threatening the stability of the final 3.0 release.

I think we should also consider making the 3.0 beta the feature freeze; after 
that fixes on the features go in, but nothing else of significance, otherwise 
the value of the beta "test this code more broadly" is diminoshed

-steve
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org