Re: Planning Hadoop 2.6.1 release

2015-05-12 Thread Zhihai Xu
Hi Akira,

Can we also include YARN-3242? YARN-3242 fixed a critical ZKRMStateStore
bug.
It will work better with YARN-2992.

thanks
zhihai


On Tue, May 12, 2015 at 10:38 PM, Akira AJISAKA 
wrote:

> Thanks all for collecting jiras for 2.6.1 release. In addition, I'd like
> to include the following:
>
> * HADOOP-11343. Overflow is not properly handled in calculating final iv
> for AES CTR
> * YARN-2874. Dead lock in "DelegationTokenRenewer" which blocks RM to
> execute any further apps
> * YARN-2992. ZKRMStateStore crashes due to session expiry
> * YARN-3013. AMRMClientImpl does not update AMRM token properly
> * YARN-3369. Missing NullPointer check in AppSchedulingInfo causes RM to
> die
> * MAPREDUCE-6303. Read timeout when retrying a fetch error can be fatal to
> a reducer
>
> All of these are marked as blocker bug for 2.7.0 but not fixed in 2.6.0.
>
> Regards,
> Akira
>
>
> On 5/4/15 11:15, Brahma Reddy Battula wrote:
>
>> Hello Vinod,
>>
>> I am thinking,can we include HADOOP-11491 also..? wihout this jira harfs
>> will not be usable when cluster installed in HA mode and try to get
>> filecontext like below..
>>
>>
>> Path path = new
>> Path("har:///archivedLogs/application_1428917727658_0005-application_1428917727658_0008-1428927448352.har");
>> FileSystem fs = path.getFileSystem(new Configuration());
>> path = fs.makeQualified(path);
>> FileContext fc = FileContext.getFileContext(path.toUri(),new
>> Configuration());
>>
>>
>>
>> Thanks & Regards
>> Brahma Reddy Battula
>> 
>> From: Chris Nauroth [cnaur...@hortonworks.com]
>> Sent: Friday, May 01, 2015 4:32 AM
>> To: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org;
>> yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org
>> Subject: Re: Planning Hadoop 2.6.1 release
>>
>> Thank you, Arpit.  In addition, I suggest we include the following:
>>
>> HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification
>> pipe is full
>> HADOOP-11604. Prevent ConcurrentModificationException while closing domain
>> sockets during shutdown of DomainSocketWatcher thread.
>> HADOOP-11648. Set DomainSocketWatcher thread name explicitly
>> HADOOP-11802. DomainSocketWatcher thread terminates sometimes after there
>> is an I/O error during requestShortCircuitShm
>>
>> HADOOP-11604 and 11648 are not critical by themselves, but they are
>> pre-requisites to getting a clean cherry-pick of 11802, which we believe
>> finally fixes the root cause of this issue.
>>
>>
>> --Chris Nauroth
>>
>>
>>
>>
>> On 4/30/15, 3:55 PM, "Arpit Agarwal"  wrote:
>>
>>  HDFS candidates for back-porting to Hadoop 2.6.1. The first two were
>>> requested in [1].
>>>
>>> HADOOP-11674. oneByteBuf in CryptoInputStream and CryptoOutputStream
>>> should be non static
>>> HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt
>>> synchronization
>>>
>>> HDFS-7009. Active NN and standby NN have different live nodes.
>>> HDFS-7035. Make adding a new data directory to the DataNode an atomic and
>>> improve error handling
>>> HDFS-7425. NameNode block deletion logging uses incorrect appender.
>>> HDFS-7443. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate
>>> block files are present in the same volume.
>>> HDFS-7489. Incorrect locking in FsVolumeList#checkDirs can hang datanodes
>>> HDFS-7503. Namenode restart after large deletions can cause slow
>>> processReport.
>>> HDFS-7575. Upgrade should generate a unique storage ID for each volume.
>>> HDFS-7579. Improve log reporting during block report rpc failure.
>>> HDFS-7587. Edit log corruption can happen if append fails with a quota
>>> violation.
>>> HDFS-7596. NameNode should prune dead storages from storageMap.
>>> HDFS-7611. deleteSnapshot and delete of a file can leave orphaned blocks
>>> in the blocksMap on NameNode restart.
>>> HDFS-7714. Simultaneous restart of HA NameNodes and DataNode can cause
>>> DataNode to register successfully with only one NameNode.
>>> HDFS-7733. NFS: readdir/readdirplus return null directory attribute on
>>> failure.
>>> HDFS-7831. Fix the starting index and end condition of the loop in
>>> FileDiffList.findEarlierSnapshotBlocks().
>>> HDFS-7885. Datanode should not trust the generation stamp provided by
>>> client.
>>> HDFS-7960. The full block report should prune zombie storages even if
>>> they're not empty.
>>> HDFS-8072. Reserved RBW space is not released if client terminates while
>>> writing block.
>>> HDFS-8127. NameNode Failover during HA upgrade can cause DataNode to
>>> finalize upgrade.
>>>
>>>
>>> Arpit
>>>
>>> [1] Will Hadoop 2.6.1 be released soon?
>>> http://markmail.org/thread/zlsr6prejyogdyvh
>>>
>>>
>>>
>>> On 4/27/15, 11:47 AM, "Vinod Kumar Vavilapalli" 
>>> wrote:
>>>
>>>  There were several requests on the user lists [1] for a 2.6.1 release. I
 got many offline comments too.

 Planning to do a 2.6.1 release in a few weeks time. We already have a
 bunch
 of tickets committed to 2.7.1. I created 

Re: Planning Hadoop 2.6.1 release

2015-05-12 Thread Akira AJISAKA
Thanks all for collecting jiras for 2.6.1 release. In addition, I'd like 
to include the following:


* HADOOP-11343. Overflow is not properly handled in calculating final iv 
for AES CTR
* YARN-2874. Dead lock in "DelegationTokenRenewer" which blocks RM to 
execute any further apps

* YARN-2992. ZKRMStateStore crashes due to session expiry
* YARN-3013. AMRMClientImpl does not update AMRM token properly
* YARN-3369. Missing NullPointer check in AppSchedulingInfo causes RM to die
* MAPREDUCE-6303. Read timeout when retrying a fetch error can be fatal 
to a reducer


All of these are marked as blocker bug for 2.7.0 but not fixed in 2.6.0.

Regards,
Akira

On 5/4/15 11:15, Brahma Reddy Battula wrote:

Hello Vinod,

I am thinking,can we include HADOOP-11491 also..? wihout this jira harfs will 
not be usable when cluster installed in HA mode and try to get filecontext like 
below..


Path path = new 
Path("har:///archivedLogs/application_1428917727658_0005-application_1428917727658_0008-1428927448352.har");
FileSystem fs = path.getFileSystem(new Configuration());
path = fs.makeQualified(path);
FileContext fc = FileContext.getFileContext(path.toUri(),new Configuration());



Thanks & Regards
Brahma Reddy Battula

From: Chris Nauroth [cnaur...@hortonworks.com]
Sent: Friday, May 01, 2015 4:32 AM
To: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org
Subject: Re: Planning Hadoop 2.6.1 release

Thank you, Arpit.  In addition, I suggest we include the following:

HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification
pipe is full
HADOOP-11604. Prevent ConcurrentModificationException while closing domain
sockets during shutdown of DomainSocketWatcher thread.
HADOOP-11648. Set DomainSocketWatcher thread name explicitly
HADOOP-11802. DomainSocketWatcher thread terminates sometimes after there
is an I/O error during requestShortCircuitShm

HADOOP-11604 and 11648 are not critical by themselves, but they are
pre-requisites to getting a clean cherry-pick of 11802, which we believe
finally fixes the root cause of this issue.


--Chris Nauroth




On 4/30/15, 3:55 PM, "Arpit Agarwal"  wrote:


HDFS candidates for back-porting to Hadoop 2.6.1. The first two were
requested in [1].

HADOOP-11674. oneByteBuf in CryptoInputStream and CryptoOutputStream
should be non static
HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt
synchronization

HDFS-7009. Active NN and standby NN have different live nodes.
HDFS-7035. Make adding a new data directory to the DataNode an atomic and
improve error handling
HDFS-7425. NameNode block deletion logging uses incorrect appender.
HDFS-7443. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate
block files are present in the same volume.
HDFS-7489. Incorrect locking in FsVolumeList#checkDirs can hang datanodes
HDFS-7503. Namenode restart after large deletions can cause slow
processReport.
HDFS-7575. Upgrade should generate a unique storage ID for each volume.
HDFS-7579. Improve log reporting during block report rpc failure.
HDFS-7587. Edit log corruption can happen if append fails with a quota
violation.
HDFS-7596. NameNode should prune dead storages from storageMap.
HDFS-7611. deleteSnapshot and delete of a file can leave orphaned blocks
in the blocksMap on NameNode restart.
HDFS-7714. Simultaneous restart of HA NameNodes and DataNode can cause
DataNode to register successfully with only one NameNode.
HDFS-7733. NFS: readdir/readdirplus return null directory attribute on
failure.
HDFS-7831. Fix the starting index and end condition of the loop in
FileDiffList.findEarlierSnapshotBlocks().
HDFS-7885. Datanode should not trust the generation stamp provided by
client.
HDFS-7960. The full block report should prune zombie storages even if
they're not empty.
HDFS-8072. Reserved RBW space is not released if client terminates while
writing block.
HDFS-8127. NameNode Failover during HA upgrade can cause DataNode to
finalize upgrade.


Arpit

[1] Will Hadoop 2.6.1 be released soon?
http://markmail.org/thread/zlsr6prejyogdyvh



On 4/27/15, 11:47 AM, "Vinod Kumar Vavilapalli" 
wrote:


There were several requests on the user lists [1] for a 2.6.1 release. I
got many offline comments too.

Planning to do a 2.6.1 release in a few weeks time. We already have a
bunch
of tickets committed to 2.7.1. I created a filter [2] to tracking pending
tickets.

We need to collectively come up with a list of critical issues. We can
use
the JIRA Target Version field for the same. I see some but not a whole
lot
of new work for this release, most of it is likely going to be pulling in
critical patches from 2.7.1/2.8 etc.

Thoughts?

Thanks
+Vinod

[1] Will Hadoop 2.6.1 be released soon?
http://markmail.org/thread/zlsr6prejyogdyvh
[2] 2.6.1 pending tickets
https://issues.apache.org/jira/issues/?filter=12331711










[jira] [Created] (HADOOP-11967) SpanReceiverHost should be able to handle tracing configuration properties without prefix

2015-05-12 Thread Masatake Iwasaki (JIRA)
Masatake Iwasaki created HADOOP-11967:
-

 Summary: SpanReceiverHost should be able to handle tracing 
configuration properties without prefix
 Key: HADOOP-11967
 URL: https://issues.apache.org/jira/browse/HADOOP-11967
 Project: Hadoop Common
  Issue Type: Improvement
  Components: tracing
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11966) Variable cygwin is undefined in hadoop-config.sh when executed through hadoop-daemon.sh.

2015-05-12 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-11966:
--

 Summary: Variable cygwin is undefined in hadoop-config.sh when 
executed through hadoop-daemon.sh.
 Key: HADOOP-11966
 URL: https://issues.apache.org/jira/browse/HADOOP-11966
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.7.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Critical


HADOOP-11464 reinstated support for running the bash scripts through Cygwin.  
The logic involves setting a {{cygwin}} flag variable to indicate if the script 
is executing through Cygwin.  The flag is set in all of the interactive 
scripts: {{hadoop}}, {{hdfs}}, {{yarn}} and {{mapred}}.  The flag is not set 
through hadoop-daemon.sh though.  This can cause an erroneous overwrite of 
{{HADOOP_HOME}} and {{JAVA_LIBRARY_PATH}} inside hadoop-config.sh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11965) determine-flaky-tests needs a summary mode

2015-05-12 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-11965:
-

 Summary: determine-flaky-tests needs a summary mode
 Key: HADOOP-11965
 URL: https://issues.apache.org/jira/browse/HADOOP-11965
 Project: Hadoop Common
  Issue Type: Test
Reporter: Allen Wittenauer
Priority: Minor


Running determine-flaky-tests against PreCommit-HDFS-Build generates just 
slightly under 10,000 lines of output.  That's significantly too much to be 
useful for the casual user.  It's also not formatted in such a way to be easily 
machine parseable to generate a decent report.  A summary mode should be added 
that just prints out the top X failed tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11964) determine-flaky-tests makes invalid test assumptions

2015-05-12 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-11964:
-

 Summary: determine-flaky-tests makes invalid test assumptions
 Key: HADOOP-11964
 URL: https://issues.apache.org/jira/browse/HADOOP-11964
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Allen Wittenauer


When running determine-flaky-tests against precommit-hadoop-build, it throws a 
lot of errors because it assumes that every job is actually running Java tests. 
 There should be some way to make it not do that or at least fix its 
assumptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


HADOOP-11933: Jenkins pre-commit in a docker container

2015-05-12 Thread Allen Wittenauer

Hi.

A few times now, we’ve run into issues where we weren’t sure what sort 
of state the surrounding environment was present when a pre commit patch test 
was done.  Additionally, there are times when installing or even updating a 
component was a challenge.   There are some bits that we do not compile or even 
test as part of the Jenkins run as a result.

After a bit less than a week of work, I’ve managed to get test-patch.sh 
smart enough to launch and re-exec itself inside a docker container.  The 
container definition is part of the source tree.  This effectively means that, 
after HADOOP-11933 is committed, we’ll be able to have a much greater sense of 
control over the exact environment that is running during patch test time.  
We’ll be able to easily add/remove components as necessary.

Currently, HADOOP-11933 is awaiting review.  But I thought I’d pop this 
message out here so that more people are aware of the patch as well as if there 
are any thoughts/concerns/feature requests/etc prior to any potential commit. 
It should be noted that Jenkins’ precommit has the flags configured such that 
when test-patch.sh re-exec’s itself to test the patch, it does so in a docker 
container. In other words, the docker container patch is testing itself in a 
docker container. :)  (Other patches ignore those flags since this patch isn’t 
live yet.)

Thanks.

[jira] [Created] (HADOOP-11963) Metrics documentation for FSNamesystem misspells PendingDataNodeMessageCount.

2015-05-12 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-11963:
--

 Summary: Metrics documentation for FSNamesystem misspells 
PendingDataNodeMessageCount.
 Key: HADOOP-11963
 URL: https://issues.apache.org/jira/browse/HADOOP-11963
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Chris Nauroth
Assignee: Anu Engineer
Priority: Trivial


http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/Metrics.html#FSNamesystem

{quote}
PendingDataNodeMessageCourt (HA-only) Current number of pending 
block-related messages for later processing in the standby NameNode
{quote}

This needs to be changed to "PendingDataNodeMessageCount".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11961) Add interface of whether codec has chunk boundary to Erasure coder

2015-05-12 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu resolved HADOOP-11961.
-
Resolution: Invalid

> Add interface of whether codec has chunk boundary to Erasure coder
> --
>
> Key: HADOOP-11961
> URL: https://issues.apache.org/jira/browse/HADOOP-11961
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Yi Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11962) Sasl message with MD5 challenge text shouldn't be LOG as debug level.

2015-05-12 Thread Junping Du (JIRA)
Junping Du created HADOOP-11962:
---

 Summary: Sasl message with MD5 challenge text shouldn't be LOG as 
debug level.
 Key: HADOOP-11962
 URL: https://issues.apache.org/jira/browse/HADOOP-11962
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, security
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical


Some log examples:
{noformat}
2014-09-24 05:42:12,975 DEBUG security.SaslRpcServer 
(SaslRpcServer.java:create(174)) - Created SASL server with mechanism = 
DIGEST-MD5
2014-09-24 05:42:12,977 DEBUG ipc.Server (Server.java:doSaslReply(1424)) - 
Sending sasl message state: NEGOTIATE
auths {
  method: "TOKEN"
  mechanism: "DIGEST-MD5"
  protocol: ""
  serverId: "default"
  challenge: 
"realm=\"default\",nonce=\"yIvZDpbzGGq3yIrMynVKnEv9Z0qw6lxpr9nZxm0r\",qop=\"auth\",charset=utf-8,algorithm=md5-sess"
}
...
...
2014-09-24 06:21:59,146 DEBUG ipc.Server (Server.java:doSaslReply(1424)) - 
Sending sasl message state: CHALLENGE
token: 
"`l\006\t*\206H\206\367\022\001\002\002\002\000o]0[\240\003\002\001\005\241\003\002\001\017\242O0M\240\003\002\001\020\242F\004D#\030\336|kb\232\033V\340\342F\334\230\347\230\362)u!=\215\271\006\244:\244\221vn\215*\323\353\360\350\3006\366\3340\245\371Ri\273\374\307\017\207Z\233\326\217\224!yo$\373\233\315:JsY!^?"
{noformat}
We should get rid of this kind of log in production environment even under 
debug log level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Hadoop-Common-trunk #1493

2015-05-12 Thread Apache Jenkins Server
See 



Re: H10 and H11

2015-05-12 Thread Steve Loughran

> On 12 May 2015, at 00:02, Allen Wittenauer  wrote:
> 
> 
>   Anyone know if these machines are supposed to be in the Ubuntu pool on 
> Jenkins?  Given they are H-machines, shouldn’t they been in the Hadoop pool?  
> Right now, we’ve got stuff waiting with zero test slots available while the 
> Ubuntu pool has slots open that we can’t use. :(
> 
> 
 you asked on the builds@ list?


[jira] [Created] (HADOOP-11961) Add isLinear interface to Erasure coder

2015-05-12 Thread Yi Liu (JIRA)
Yi Liu created HADOOP-11961:
---

 Summary: Add isLinear interface to Erasure coder
 Key: HADOOP-11961
 URL: https://issues.apache.org/jira/browse/HADOOP-11961
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu


Today, we have a discussion including [~zhz], [~drankye], etc., also discuss in 
HDFS-8347.
Some coder like {{RS}} and {{XOR}} is linear, some have coding boundary like 
HitchHicker.  If the coder is linear, we can decode at any size, and we don't 
need to padding inputs to *chunksize*,  if the coder is not linear, the inputs 
need to padding to *chunksize*, then do decode.

This interface is important for performance, and can save memory/disk space 
since the parity cells are the same as first data cell (less than codec 
chunksize).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)