[jira] [Created] (HADOOP-16526) Support LDAP authenticaition (bind) via GSSAPI

2019-08-21 Thread Todd Lipcon (Jira)
Todd Lipcon created HADOOP-16526:


 Summary: Support LDAP authenticaition (bind) via GSSAPI
 Key: HADOOP-16526
 URL: https://issues.apache.org/jira/browse/HADOOP-16526
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Todd Lipcon


Currently the LDAP group mapping provider only supports simple (user/password) 
authentication. In some cases it's more convenient to use GSSAPI (kerberos) 
authentication here, particularly when the server doing the mapping is already 
using a keytab provided by the same instance (eg IPA or AD). We should provide 
a configuration to turn on GSSAPI and put the right UGI 'doAs' calls in place 
to ensure an appropriate Subject in those calls.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16525) LDAP group mapping should include primary posix group

2019-08-21 Thread Todd Lipcon (Jira)
Todd Lipcon created HADOOP-16525:


 Summary: LDAP group mapping should include primary posix group
 Key: HADOOP-16525
 URL: https://issues.apache.org/jira/browse/HADOOP-16525
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Todd Lipcon


When configuring LdapGroupsMapping against FreeIPA, the current implementation 
searches for groups which have the user listed as a member. This catches all 
"secondary" groups but misses the user's primary group (typically the same name 
as their username). We should include a search for a group matching the user's 
primary gidNumber in the group search.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16179) hadoop-common pom should not depend on kerb-simplekdc

2019-03-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-16179:


 Summary: hadoop-common pom should not depend on kerb-simplekdc
 Key: HADOOP-16179
 URL: https://issues.apache.org/jira/browse/HADOOP-16179
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Todd Lipcon


The hadoop-common pom currently has a dependency on kerb-simplekdc. In fact, 
the only classes used from Kerby are in kerb-core and kerb-util (which is a 
transitive dependency frmo kerb-core). Depending on kerb-simplekdc pulls a 
bunch of other unnecessary classes into the hadoop-common classpath.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16011) OsSecureRandom very slow compared to other SecureRandom implementations

2018-12-14 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-16011:


 Summary: OsSecureRandom very slow compared to other SecureRandom 
implementations
 Key: HADOOP-16011
 URL: https://issues.apache.org/jira/browse/HADOOP-16011
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Todd Lipcon


In looking at performance of a workload which creates a lot of short-lived 
remote connections to a secured DN, [~philip] and I found very high system CPU 
usage. We tracked it down to reads from /dev/random, which are incurred by the 
DN using CryptoCodec.generateSecureRandom to generate a transient session key 
and IV for AES encryption.

In the case that the OpenSSL codec is not enabled, the above code falls through 
to the JDK SecureRandom implementation, which performs reasonably. However, 
OpenSSLCodec defaults to using OsSecureRandom, which reads all random data from 
/dev/random rather than doing something more efficient like initializing a 
CSPRNG from a small seed.

I wrote a simple JMH benchmark to compare various approaches when running with 
concurrency 10:
 testHadoop - using CryptoCodec
 testNewSecureRandom - using 'new SecureRandom()' each iteration
 testSha1PrngNew - using the SHA1PRNG explicitly, new instance each iteration
 testSha1PrngShared - using a single shared instance of SHA1PRNG
 testSha1PrngThread - using a thread-specific instance of SHA1PRNG
{code:java}
Benchmark Mode  CntScore   Error  Units
MyBenchmark.testHadoop   thrpt  1293.000  ops/s  [with 
libhadoop.so]
MyBenchmark.testHadoop   thrpt461515.697  ops/s 
[without libhadoop.so]
MyBenchmark.testNewSecureRandom  thrpt 43413.640  ops/s
MyBenchmark.testSha1PrngNew  thrpt395515.000  ops/s
MyBenchmark.testSha1PrngShared   thrpt164488.713  ops/s
MyBenchmark.testSha1PrngThread   thrpt   4295123.210  ops/s
{code}

In other words, the presence of the OpenSSL acceleration slows down this code 
path by 356x. And, compared to the optimal (thread-local Sha1Prng) it's 3321x 
slower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Hadoop RPC encryption performance improvements

2018-11-02 Thread Todd Lipcon
One possibility (which we use in Kudu) is to use SSL for encryption but
with a self-signed certificate, maintaining the existing SASL/GSSAPI
handshake for authentication. The one important bit here, security wise, is
to implement channel binding (RFC 5056 and RFC 5929) to prevent against
MITMs. The description of the Kudu protocol is here:
https://github.com/apache/kudu/blob/master/docs/design-docs/rpc.md#wire-protocol

If implemented correctly, this provides TLS encryption (with all of its
performance and security benefits) without requiring the user to deploy a
custom cert.

-Todd

On Thu, Nov 1, 2018 at 7:14 PM Konstantin Shvachko 
wrote:

> Hi Wei-Chiu,
>
> Thanks for starting the thread and summarizing the problem. Sorry for slow
> response.
> We've been looking at the encrypted performance as well and are interested
> in this effort.
> We ran some benchmarks locally. Our benchmarks also showed substantial
> penalty for turning on wire encryption on rpc.
> Although it was less drastic - more in the range of -40%. But we ran a
> different benchmark NNThroughputBenchmark, and we ran it on 2.6 last year.
> Could have published the results, but need to rerun on more recent
> versions.
>
> Three points from me on this discussion:
>
> 1. We should settle on the benchmarking tools.
> For development RPCCallBenchmark is good as it measures directly the
> improvement on the RPC layer. But for external consumption it is more
> important to know about e.g. NameNode RPCs performance. So we probably
> should run both benchmarks.
> 2. SASL vs SSL.
> Since current implementation is based on SASL, I think it would make sense
> to make improvements in this direction. I assume switching to SSL would
> require changes in configuration. Not sure if it will be compatible, since
> we don't have the details. At this point I would go with HADOOP-10768.
> Given all (Daryn's) concerns are addressed.
> 3. Performance improvement expectations.
> Ideally we want to have < 10% penalty for encrypted communication. Anything
> over 30% will probably have very limited usability. And there is the gray
> area in between, which could be mitigated by allowing mixed encrypted and
> un-encrypted RPCs on the single NameNode like in HDFS-13566.
>
> Thanks,
> --Konstantin
>
> On Wed, Oct 31, 2018 at 7:39 AM Daryn Sharp 
> wrote:
>
> > Various KMS tasks have been delaying my RPC encryption work – which is
> 2nd
> > on TODO list.  It's becoming a top priority for us so I'll try my best to
> > get a preliminary netty server patch (sans TLS) up this week if that
> helps.
> >
> > The two cited jiras had some critical flaws.  Skimming my comments, both
> > use blocking IO (obvious nonstarter).  HADOOP-10768 is a hand rolled
> > TLS-like encryption which I don't feel is something the community can or
> > should maintain from a security standpoint.
> >
> > Daryn
> >
> > On Wed, Oct 31, 2018 at 8:43 AM Wei-Chiu Chuang 
> > wrote:
> >
> > > Ping. Any one? Cloudera is interested in moving forward with the RPC
> > > encryption improvements, but I just like to get a consensus which
> > approach
> > > to go with.
> > >
> > > Otherwise I'll pick HADOOP-10768 since it's ready for commit, and I've
> > > spent time on testing it.
> > >
> > > On Thu, Oct 25, 2018 at 11:04 AM Wei-Chiu Chuang 
> > > wrote:
> > >
> > > > Folks,
> > > >
> > > > I would like to invite all to discuss the various Hadoop RPC
> encryption
> > > > performance improvements. As you probably know, Hadoop RPC encryption
> > > > currently relies on Java SASL, and have _really_ bad performance (in
> > > terms
> > > > of number of RPCs per second, around 15~20% of the one without SASL)
> > > >
> > > > There have been some attempts to address this, most notably,
> > HADOOP-10768
> > > > <https://issues.apache.org/jira/browse/HADOOP-10768> (Optimize
> Hadoop
> > > RPC
> > > > encryption performance) and HADOOP-13836
> > > > <https://issues.apache.org/jira/browse/HADOOP-13836> (Securing
> Hadoop
> > > RPC
> > > > using SSL). But it looks like both attempts have not been
> progressing.
> > > >
> > > > During the recent Hadoop contributor meetup, Daryn Sharp mentioned
> he's
> > > > working on another approach that leverages Netty for its SSL
> > encryption,
> > > > and then integrate Netty with Hadoop RPC so that Hadoop RPC
> > automatically
> > > > benefits from netty's SSL encryption performance.
> > > >
> > > > So there are at least 3 attempts to address this issue as I see it.
> Do
> > we
> > > > have a consensus that:
> > > > 1. this is an important problem
> > > > 2. which approach we want to move forward with
> > > >
> > > > --
> > > > A very happy Hadoop contributor
> > > >
> > >
> > >
> > > --
> > > A very happy Hadoop contributor
> > >
> >
> >
> > --
> >
> > Daryn
> >
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS]: securing ASF Hadoop releases out of the box

2018-07-05 Thread Todd Lipcon
gs to think about
> >
> > * docs explaining IN CAPITAL LETTERS why you need to lock down your
> > cluster to a private subnet or use Kerberos
> > * Anything which can be done to make Kerberos easier (?). I see
> there are
> > some oustanding patches for HADOOP-12649 which need review, but what
> else?
> >
> > Could we have Hadoop determine when it's coming up on an open
> network and
> > start warning? And how?
> >
> > At the very least, single node hadoop should be locked down. You
> shouldn't
> > have to bring up kerberos to run it like that. And for more
> sophisticated
> > multinode deployments, should the scripts refuse to work without
> kerberos
> > unless you pass in some argument like "--Dinsecure-clusters-
> permitted"
> >
> > Any other ideas?
> >
> >
> > 
> -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
> >
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-15566) Remove HTrace support

2018-06-27 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-15566:


 Summary: Remove HTrace support
 Key: HADOOP-15566
 URL: https://issues.apache.org/jira/browse/HADOOP-15566
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Todd Lipcon


The HTrace incubator project has voted to retire itself and won't be making 
further releases. The Hadoop project currently has various hooks with HTrace. 
It seems in some cases (eg HDFS-13702) these hooks have had measurable 
performance overhead. Given these two factors, I think we should consider 
removing the HTrace integration. If there is someone willing to do the work, 
replacing it with OpenTracing might be a better choice since there is an active 
community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15564) Classloading Shell should not run a subprocess

2018-06-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-15564:


 Summary: Classloading Shell should not run a subprocess
 Key: HADOOP-15564
 URL: https://issues.apache.org/jira/browse/HADOOP-15564
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 3.0.0
Reporter: Todd Lipcon


The 'Shell' class has a static member isSetsidSupported which, in order to 
initialize, forks out a subprocess. Various other parts of the code reference 
Shell.WINDOWS. For example, the StringUtils class has such a reference. This 
means that, during startup, a seemingly fast call like 
Configuration.getBoolean() ends up class-loading StringUtils, which class-loads 
Shell, which forks out a subprocess. I couldn't measure any big improvement by 
fixing this, but seemed surprising to say the least.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15557) Crypto streams should not crash when mis-used

2018-06-22 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-15557:


 Summary: Crypto streams should not crash when mis-used
 Key: HADOOP-15557
 URL: https://issues.apache.org/jira/browse/HADOOP-15557
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.2.0
Reporter: Todd Lipcon


In general, the non-positional read APIs for streams in Hadoop Common are meant 
to be used by only a single thread at a time. It would not make much sense to 
have concurrent multi-threaded access to seek+read because they modify the 
stream's file position. Multi-threaded access on input streams can be done 
using positional read APIs. Multi-threaded access on output streams probably 
never makes sense.

In the case of DFSInputStream, the positional read APIs are marked 
synchronized, so that even when misused, no strange exceptions are thrown. The 
results are just somewhat undefined in that it's hard for a thread to know 
which position was read from. However, when running on an encrypted file 
system, the results are much worse: since CryptoInputStream's read methods are 
not marked synchronized, the caller can get strange ByteBuffer exceptions or 
even a JVM crash due to concurrent use and free of underlying OpenSSL Cipher 
buffers.

The crypto stream wrappers should be made more resilient to such misuse, for 
example by:
(a) making the read methods safer by making them synchronized (so they have the 
same behavior as DFSInputStream)
or
(b) trying to detect concurrent access to these methods and throwing 
ConcurrentModificationException so that the user is alerted to their probable 
misuse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15554) Improve JIT performance for Configuration parsing

2018-06-21 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-15554:


 Summary: Improve JIT performance for Configuration parsing
 Key: HADOOP-15554
 URL: https://issues.apache.org/jira/browse/HADOOP-15554
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf, performance
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


In investigating a performance regression for small tasks between Hadoop 2 and 
Hadoop 3, we found that the amount of time spent in JIT was significantly 
higher. Using jitwatch we were able to determine that, due to a combination of 
switching from DOM to SAX style parsing and just having more configuration 
key/value pairs, Configuration.loadResource is now getting compiled with the C2 
compiler and taking quite some time. Breaking that very large function up into 
several smaller ones and eliminating some redundant bits of code improves the 
JIT performance measurably.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15551) Avoid use of Java8 streams in Configuration.addTags

2018-06-20 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-15551:


 Summary: Avoid use of Java8 streams in Configuration.addTags
 Key: HADOOP-15551
 URL: https://issues.apache.org/jira/browse/HADOOP-15551
 Project: Hadoop Common
  Issue Type: Improvement
  Components: performance
Affects Versions: 3.2
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Configuration.addTags oddly uses Arrays.stream instead of a more conventional 
mechanism. When profiling a simple program that uses Configuration, I found 
that addTags was taking tens of millis of CPU to do very little work the first 
time it's called, accounting for ~8% of total profiler samples in my program.

{code}
[9] 4.52% 253 self: 0.00% 0 java/lang/invoke/MethodHandleNatives.linkCallSite
[9] 3.71% 208 self: 0.00% 0 
java/lang/invoke/MethodHandleNatives.linkMethodHandleConstant
{code}

I don't know much about the implementation details of the Streams stuff, but it 
seems it's probably meant more for cases with very large arrays or somesuch. 
Switching to a normal Set.addAll() call eliminates this from the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15550) Avoid static initialization of ObjectMappers

2018-06-20 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-15550:


 Summary: Avoid static initialization of ObjectMappers
 Key: HADOOP-15550
 URL: https://issues.apache.org/jira/browse/HADOOP-15550
 Project: Hadoop Common
  Issue Type: Bug
  Components: performance
Affects Versions: 3.2.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Various classes statically initialize an ObjectMapper READER instance. This 
ends up doing a bunch of class-loading of Jackson libraries that can add up to 
a fair amount of CPU, even if the reader ends up not being used. This is 
particularly the case with WebHdfsFileSystem, which is class-loaded by a 
serviceloader even when unused in a particular job. We should lazy-init these 
members instead of doing so as a static class member.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15549) Upgrade to commons-configuration 2.1 regresses task CPU consumption

2018-06-20 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-15549:


 Summary: Upgrade to commons-configuration 2.1 regresses task CPU 
consumption
 Key: HADOOP-15549
 URL: https://issues.apache.org/jira/browse/HADOOP-15549
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 3.0.2
Reporter: Todd Lipcon
Assignee: Todd Lipcon


HADOOP-13660 upgraded from commons-configuration 1.x to 2.x. 
commons-configuration is used when parsing the metrics configuration properties 
file. The new builder API used in the new version apparently makes use of a 
bunch of very bloated reflection and classloading nonsense to achieve the same 
goal, and this results in a regression of >100ms of CPU time as measured by a 
program which simply initializes DefaultMetricsSystem.

This isn't a big deal for long-running daemons, but for MR tasks which might 
only run a few seconds on poorly-tuned jobs, this can be noticeable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-9545) Improve logging in ActiveStandbyElector

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-9545.
-
Resolution: Won't Fix

> Improve logging in ActiveStandbyElector
> ---
>
> Key: HADOOP-9545
> URL: https://issues.apache.org/jira/browse/HADOOP-9545
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.1.0-beta
>    Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
>
> The ActiveStandbyElector currently logs a lot of stuff at DEBUG level which 
> would be useful for troubleshooting. We've seen one instance in the wild of a 
> ZKFC thinking it should be in standby state when in fact it won the election, 
> but the logging is insufficient to understand why. I'd like to bump most of 
> the existing DEBUG logs to INFO and add some additional logs as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-10859) Native implementation of java Checksum interface

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-10859.
--
Resolution: Won't Fix

No plans to work on this.

> Native implementation of java Checksum interface
> 
>
> Key: HADOOP-10859
> URL: https://issues.apache.org/jira/browse/HADOOP-10859
> Project: Hadoop Common
>  Issue Type: Improvement
>        Reporter: Todd Lipcon
>    Assignee: Todd Lipcon
>Priority: Minor
>
> Some parts of our code such as IFileInputStream/IFileOutputStream use the 
> java Checksum interface to calculate/verify checksums. Currently we don't 
> have a native implementation of these. For CRC32C in particular, we can get a 
> very big speedup with a native implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Version control link on webpage

2015-05-17 Thread Todd Lipcon
https://hadoop.apache.org/version_control.html

Still seems to point to SVN instead of git.

Is the site itself still versioned from the SVN repository or has that also
transferred to git so we can fix this?

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Git repo ready to use

2014-09-15 Thread Todd Lipcon
, 2014 at 4:21 PM, Karthik Kambatla 
  ka...@cloudera.com
 wrote:

 Thanks Giri and Ted for fixing the builds.


 On Thu, Aug 28, 2014 at 9:49 AM, Ted Yu 
  yuzhih...@gmail.com
   
  wrote:

 Charles:
 QA build is running for your JIRA:

 
  https://builds.apache.org/job/PreCommit-hdfs-Build/7828/parameters/

 Cheers


 On Thu, Aug 28, 2014 at 9:41 AM, Charles Lamb 
 cl...@cloudera.com
  
 wrote:

 On 8/28/2014 12:07 PM, Giridharan Kesavan wrote:

 Fixed all the 3 pre-commit buids. test-patch's git
 reset
--hard
  is
 removing
 the patchprocess dir, so moved it off the workspace.
 Thanks Giri. Should I resubmit HDFS-6954's patch? I've
   gotten
3
  or 4
 jenkins messages that indicated the problem so
 something
  is
 resubmitting,
 but now that you've fixed it, should I resubmit it
 again?

 Charles

   
  
 
 
 
  --
  Alejandro
 

   
  
 




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: migrating private branches to the new git repo

2014-09-02 Thread Todd Lipcon
On Tue, Sep 2, 2014 at 2:38 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Not to derail the conversation, but if CHANGES.txt is making backports more
 annoying, why don't we get rid of it? It seems like we should be able to
 generate it via a JIRA query, and git log can also be used for a quick
 check (way faster than svn log).


+1, I've always found CHANGES.txt to be a big pain in the butt, and often
it gets incorrect, too.




 On Tue, Sep 2, 2014 at 12:38 PM, Steve Loughran ste...@hortonworks.com
 wrote:

  I've now done my first commits; one into trunk (10373), one into branch-2
  and cherry picked (fix in
  hadoop-common-project/hadoop-common/src/main/native/README ; no JIRA).
 
  I made an initial attempt to cherry pick the HADOOP-10373 patch from
 trunk
  into branch-2, with CHANGES.TXT being a dramatic enough change that it
  takes human intervention to patch.
 
  implication
 
 
 1. committing to branch-2 with changes.txt in the same commit followed
 by a cherry pick forwards works.
 2. committing to trunk only backports reliably if the changes.txt
 files
 are patched in a separate commit
 
  This is no different from SVN, except that an svn merge used different
  commands.
 
  I have not tried the git format-patch/git am option, which would be:
 
 
 1. -use git am -3 to apply the patch to the HEAD of both branch-2 and
 trunk
 2. -patch changes.txt in each branch, then either commit separately
 3. -or try and amend latest commit for the patches
 
  #3 seems appealing, but it'd make the diff on the two branches different.
 
 
 
  On 2 September 2014 19:01, Andrew Wang andrew.w...@cloudera.com wrote:
 
   This is basically what I did, make patches of each of my branches and
  then
   reapply to the new trunk. One small recommendation would be to make the
   remote named apache rather than asflive so it's consistent with the
   GitAndHadoop wikipage. IMO naming branches with a / (e.g.
 live/trunk)
   is also kind of ambiguous, since it's the same syntax used to specify a
   remote. It seems there can also be difficulties with directory and
   filenames.
  
   Somewhat related, it'd be nice to update the GitAndHadoop instructions
 on
   how to generate a patch using git-format-patch. I've been using plain
 old
   git diff for a while, but format-patch seems better. It'd be
 especially
   nice if a recommended .gitconfig section was made available :)
  
   I plan to play with format-patch some in the near future and might do
  this
   myself, but if any git gurus already have this ready to go, feel free
 to
   edit.
  
  
   On Tue, Sep 2, 2014 at 4:10 AM, Steve Loughran ste...@hortonworks.com
 
   wrote:
  
Now that hadoop is using git, I'm migrating my various
 work-in-progress
branches to the new commit tree
   
   
1. This is the process I've written up for using git format-patch
 then
   git
am to export the patch sequence and merge it in, then rebasing onto
  trunk
to finally get in sync
   
https://wiki.apache.org/hadoop/MigratingPrivateGitBranches
   
2. The Git and hadoop docs cover git graft:
   
   
  
 
 https://wiki.apache.org/hadoop/GitAndHadoop#Grafts_for_complete_project_history
   
I'm not sure if/how that relates
   
Is there any easier way than what I've described for doing the move?
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
   to
which it is addressed and may contain information that is
 confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Updates on migration to git

2014-08-26 Thread Todd Lipcon
 of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.
   
   
   
  
   --
   Mobile
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 




-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-10882) Move DirectBufferPool into common util

2014-07-22 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-10882:


 Summary: Move DirectBufferPool into common util
 Key: HADOOP-10882
 URL: https://issues.apache.org/jira/browse/HADOOP-10882
 Project: Hadoop Common
  Issue Type: Task
  Components: util
Affects Versions: 2.6.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


MAPREDUCE-2841 uses a direct buffer pool to pass data back and forth between 
native and Java code. The branch has an implementation which appears to be 
derived from the one in HDFS. Instead of copy-pasting, we should move the HDFS 
DirectBufferPool into Common so that MR can make use of it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10859) Native implementation of java Checksum interface

2014-07-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-10859:


 Summary: Native implementation of java Checksum interface
 Key: HADOOP-10859
 URL: https://issues.apache.org/jira/browse/HADOOP-10859
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


Some parts of our code such as IFileInputStream/IFileOutputStream use the java 
Checksum interface to calculate/verify checksums. Currently we don't have a 
native implementation of these. For CRC32C in particular, we can get a very big 
speedup with a native implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10855) Allow Text to be read with a known length

2014-07-17 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-10855:


 Summary: Allow Text to be read with a known length
 Key: HADOOP-10855
 URL: https://issues.apache.org/jira/browse/HADOOP-10855
 Project: Hadoop Common
  Issue Type: Improvement
  Components: io
Affects Versions: 2.6.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


For the native task work (MAPREDUCE-2841) it is useful to be able to store 
strings in a different fashion than the default (varint-prefixed) 
serialization. We should provide a read method in Text which takes an 
already-known length to support this use case while still providing Text 
objects back to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10288) Explicit reference to Log4JLogger breaks non-log4j users

2014-01-25 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-10288:


 Summary: Explicit reference to Log4JLogger breaks non-log4j users
 Key: HADOOP-10288
 URL: https://issues.apache.org/jira/browse/HADOOP-10288
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.4.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


In HttpRequestLog, we make an explicit reference to the Log4JLogger class for 
an instanceof check. If the log4j implementation isn't actually on the 
classpath, the instanceof check throws NoClassDefFoundError instead of 
returning false. This means that dependent projects that don't use log4j can no 
longer embed HttpServer -- typically this is an issue when they use 
MiniDFSCluster as part of their testing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HADOOP-10199) Precommit Admin build is not running because no previous successful build is available

2014-01-02 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-10199.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

 Precommit Admin build is not running because no previous successful build is 
 available
 --

 Key: HADOOP-10199
 URL: https://issues.apache.org/jira/browse/HADOOP-10199
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Blocker
 Attachments: HADOOP-10199.patch


 It seems at some point the builds started failing for an unknown reason and 
 eventually the last successful was rolled off. At that point the precommit 
 builds started failing because they pull an artifact from the last successful 
 build.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HADOOP-10200) Fix precommit script patch_tested.txt fallback option

2014-01-02 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-10200.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

 Fix precommit script patch_tested.txt fallback option
 -

 Key: HADOOP-10200
 URL: https://issues.apache.org/jira/browse/HADOOP-10200
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HADOOP-10200.patch


 HADOOP-10199 created a fallback option when there is successful artifact. 
 However that fallback option used the jenkins lastBuild build indicator. It 
 appears that does not mean the last completed build, but strictly the last 
 build, which in this context is the current build. The current build is 
 running so it doesn't have any artifacts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HADOOP-9765) Precommit Admin job chokes on issues without an attachment

2013-11-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-9765.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

 Precommit Admin job chokes on issues without an attachment
 --

 Key: HADOOP-9765
 URL: https://issues.apache.org/jira/browse/HADOOP-9765
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HADOOP-9765.patch, HADOOP-9765.patch


 Check out this file:  
 https://builds.apache.org/job/PreCommit-Admin/lastSuccessfulBuild/artifact/patch_tested.txt
 It has corrupt data:
 {noformat}
 HIVE-4877HDFS-5010,12593214
 HIVE-4877HBASE-8693,12593082
 HIVE-4877YARN-919,12593107
 YARN-905,12593225
 HIVE-4877HBASE-8752,12588069
 {noformat}
 which resulted in the Hive precommit job being called with the ISSUE_NUM of 
 5010, 8693, 919, and 8752.
 Looking at the script and some output, I pulled from the last run, it looks 
 like it gets hosed up when there is a JIRA which is PA but doesn't have an 
 attachment (as ZK-1402 is currently sitting). For example:
 This is the bad data the script is encountering:
 {noformat}
 $ grep -A 2 'ZOOKEEPER-1402' patch_available2.elements 
 ZOOKEEPER-1402
 HBASE-8348
  id=12592318
 {noformat}
 This is where it screws up:
 {noformat}
 $ awk '{ printf %s, $0 }' patch_available2.elements | sed -e 
 s/\W*id=\/,/g | perl -pe s/\/\n/g  | grep ZOOKEEPER-1402
 ZOOKEEPER-1402HBASE-8348 ,12592318
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Next releases

2013-11-12 Thread Todd Lipcon
On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote:

 To be honest, I'm not aware of anything in 2.2.1 that shouldn't be
 there.  However, I have only been following the HDFS and common side
 of things so I may not have the full picture.  Arun, can you give a
 specific example of something you'd like to blow away?


I agree with Colin. If we've been backporting things into a patch release
(third version component) which don't belong, we should explicitly call out
those patches, so we can learn from our mistakes and have a discussion
about what belongs. Otherwise we'll just end up doing it again. Saying
there were a few mistakes, so let's reset back a bunch of backport work
seems like a baby-with-the-bathwater situation.

Todd


[jira] [Created] (HADOOP-9908) Fix NPE when versioninfo properties file is missing

2013-08-27 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9908:
---

 Summary: Fix NPE when versioninfo properties file is missing
 Key: HADOOP-9908
 URL: https://issues.apache.org/jira/browse/HADOOP-9908
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.1.0-beta, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hadoop-9908.txt

When running tests in Eclipse I ran into an NPE in VersionInfo since the 
version info properties file didn't properly make it to the classpath. This is 
because getResourceAsStream can return null if the file is not found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-8336) LocalFileSystem Does not seek to the correct location when Checksumming is off.

2013-08-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8336.
-

Resolution: Duplicate

Resolved as duplicate of HADOOP-9307, since I'm pretty sure that solved this 
issue.

 LocalFileSystem Does not seek to the correct location when Checksumming is 
 off.
 ---

 Key: HADOOP-8336
 URL: https://issues.apache.org/jira/browse/HADOOP-8336
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Elliott Clark
Assignee: Todd Lipcon
 Attachments: branch-1-test.txt


 Hbase was seeing an issue when trying to read data from a local filesystem 
 instance with setVerifyChecksum(false).  On debugging into it, the seek on 
 the file was seeking to the checksum block index, but since checksumming was 
 off that was the incorrect location.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9898) Set SO_KEEPALIVE on all our sockets

2013-08-22 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9898:
---

 Summary: Set SO_KEEPALIVE on all our sockets
 Key: HADOOP-9898
 URL: https://issues.apache.org/jira/browse/HADOOP-9898
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, net
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Priority: Minor


We recently saw an issue where network issues between slaves and the NN caused 
ESTABLISHED TCP connections to pile up and leak on the NN side. It looks like 
the RST packets were getting dropped, which meant that the client thought the 
connections were closed, while they hung open forever on the server.

Setting the SO_KEEPALIVE option on our sockets would prevent this kind of leak 
from going unchecked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9707) Fix register lists for crc32c inline assembly

2013-07-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9707:
---

 Summary: Fix register lists for crc32c inline assembly
 Key: HADOOP-9707
 URL: https://issues.apache.org/jira/browse/HADOOP-9707
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


The inline assembly used for the crc32 instructions has an incorrect clobber 
list: the computed CRC values are in-out variables and thus need to use the 
matching constraint syntax in the clobber list.

This doesn't seem to cause a problem now in Hadoop, but may break in a 
different compiler version which allocates registers differently, or may break 
when the same code is used in another context.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9618) Add thread which detects JVM pauses

2013-06-04 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9618:
---

 Summary: Add thread which detects JVM pauses
 Key: HADOOP-9618
 URL: https://issues.apache.org/jira/browse/HADOOP-9618
 Project: Hadoop Common
  Issue Type: New Feature
  Components: util
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Often times users struggle to understand what happened when a long JVM pause 
(GC or otherwise) causes things to malfunction inside a Hadoop daemon. For 
example, a long GC pause while logging an edit to the QJM may cause the edit to 
timeout, or a long GC pause may make other IPCs to the NameNode timeout. We 
should add a simple thread which loops on 1-second sleeps, and if the sleep 
ever takes significantly longer than 1 second, log a WARN. This will make GC 
pauses obvious in logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9608) ZKFC should abort if it sees an unrecognized NN become active

2013-05-30 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9608:
---

 Summary: ZKFC should abort if it sees an unrecognized NN become 
active
 Key: HADOOP-9608
 URL: https://issues.apache.org/jira/browse/HADOOP-9608
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Affects Versions: 3.0.0
Reporter: Todd Lipcon


We recently had an issue where one NameNode and ZKFC was updated to a new 
configuration/IP address but the ZKFC on the other node was not rebooted. Then, 
next time a failover occurred, the second ZKFC was not able to become active 
because the data in the ActiveBreadCrumb didn't match the data in its own 
configuration:

{code}
org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of 
election
java.lang.IllegalArgumentException: Unable to determine service address for 
namenode ''
{code}

To prevent this from happening, whenever the ZKFC sees a new NN become active, 
it should check that it's properly able to instantiate a ServiceTarget for it, 
and if not, abort (since this ZKFC wouldn't be able to handle a failover 
successfully)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9601) Support native CRC on byte arrays

2013-05-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9601:
---

 Summary: Support native CRC on byte arrays
 Key: HADOOP-9601
 URL: https://issues.apache.org/jira/browse/HADOOP-9601
 Project: Hadoop Common
  Issue Type: Improvement
  Components: performance, util
Affects Versions: 3.0.0
Reporter: Todd Lipcon


When we first implemented the Native CRC code, we only did so for direct byte 
buffers, because these correspond directly to native heap memory and thus make 
it easy to access via JNI. We'd generally assumed that accessing byte[] arrays 
from JNI was not efficient enough, but now that I know more about JNI I don't 
think that's true -- we just need to make sure that the critical sections where 
we lock the buffers are short.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9545) Improve logging in ActiveStandbyElector

2013-05-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9545:
---

 Summary: Improve logging in ActiveStandbyElector
 Key: HADOOP-9545
 URL: https://issues.apache.org/jira/browse/HADOOP-9545
 Project: Hadoop Common
  Issue Type: Improvement
  Components: auto-failover, ha
Affects Versions: 2.0.5-beta
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


The ActiveStandbyElector currently logs a lot of stuff at DEBUG level which 
would be useful for troubleshooting. We've seen one instance in the wild of a 
ZKFC thinking it should be in standby state when in fact it won the election, 
but the logging is insufficient to understand why. I'd like to bump most of the 
existing DEBUG logs to INFO and add some additional logs as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9420) Add percentile or max metric for rpcQueueTime, processing time

2013-03-20 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9420:
---

 Summary: Add percentile or max metric for rpcQueueTime, processing 
time
 Key: HADOOP-9420
 URL: https://issues.apache.org/jira/browse/HADOOP-9420
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, metrics
Affects Versions: 2.0.3-alpha
Reporter: Todd Lipcon


Currently, we only export averages for rpcQueueTime and rpcProcessingTime. 
These metrics are most useful when looking at timeouts and slow responses, 
which in my experience are often caused by momentary spikes in load, which 
won't show up in averages over the 15+ second time intervals often used by 
metrics systems. We should collect at least the max queuetime and processing 
time over each interval, or the percentiles if it's not too expensive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9399) protoc maven plugin doesn't work on mvn 3.0.2

2013-03-12 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9399:
---

 Summary: protoc maven plugin doesn't work on mvn 3.0.2
 Key: HADOOP-9399
 URL: https://issues.apache.org/jira/browse/HADOOP-9399
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hadoop-9399.txt

On my machine with mvn 3.0.2, I get a ClassCastException trying to use the 
maven protoc plugin. The issue seems to be that mvn 3.0.2 sees the ListFile 
parameter, and doesn't see the generic type argument, and stuffs Strings inside 
instead. So, we get ClassCastException trying to use the objects as Files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9358) Auth failed log should include exception string

2013-03-04 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9358:
---

 Summary: Auth failed log should include exception string
 Key: HADOOP-9358
 URL: https://issues.apache.org/jira/browse/HADOOP-9358
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, security
Affects Versions: 3.0.0, 2.0.4-beta
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Currently, when authentication fails, we see a WARN message like:
{code}
2013-02-28 22:49:03,152 WARN  ipc.Server (Server.java:saslReadAndProcess(1056)) 
- Auth failed for 1.2.3.4:12345:null
{code}
This is not useful to understand the underlying cause. The WARN entry should 
additionally include the exception text, eg:
{code}
2013-02-28 22:49:03,152 WARN  ipc.Server (Server.java:saslReadAndProcess(1056)) 
- Auth failed for 1.2.3.4:12345:null (GSS initiate failed [Caused by 
GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is 
a replay (34))])
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Vote] Merge branch-trunk-win to trunk

2013-02-27 Thread Todd Lipcon
 mapping, hardlinks, symbolic links, chmod, disk
 utilization,
  and
   process/task management.
   3. Added cmd scripts equivalent to existing shell scripts
   hadoop-daemon.sh, start and stop scripts.
   4. Addition of block placement policy implemnation to support cloud
   enviroment, more specifically Azure.
  
   We are very close to wrapping up the work in branch-trunk-win and
  getting
   ready for a merge. Currently the merge patch is passing close to 100%
  of
   unit tests on Linux. Soon I will call for a vote to merge this branch
  into
   trunk.
  
   Next steps:
   1. Call for vote to merge branch-trunk-win to trunk, when the work
   completes and precommit build is clean.
   2. Start a discussion on adding Jenkins precommit builds on windows
 and
   how to integrate that with the existing commit process.
  
   Let me know if you have any questions.
  
   Regards,
   Suresh
  
  
  
  
  --
  http://hortonworks.com/download/
  
 



 --
 http://hortonworks.com/download/




-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-02-14 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9307:
---

 Summary: BufferedFSInputStream.read returns wrong results after 
certain seeks
 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.0.2-alpha, 1.1.1
Reporter: Todd Lipcon
Assignee: Todd Lipcon


After certain sequences of seek/read, BufferedFSInputStream can silently return 
data from the wrong part of the file. Further description in first comment 
below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: development environment for hadoop core

2013-01-15 Thread Todd Lipcon
Hi Erik,

When I started out on Hadoop development, I used to use emacs for most of
my development. I eventually saw the light and switched to eclipse with a
bunch of emacs keybindings - using an IDE is really handy in Java for
functions like find callers of, quick navigation to types, etc. etags
gets you part of the way, but I'm pretty sold on eclipse at this point. The
other big advantage I found of Eclipse is that the turnaround time on
running tests is near-instant - make a change, hit save, and run a unit
test in a second or two, instead of waiting 20+sec for maven (even on a
non-clean build).

That said, for quick fixes or remote debugging work I fall back to vim
pretty quickly.

-Todd

On Tue, Jan 15, 2013 at 3:50 PM, Erik Paulson epaul...@unit1127.com wrote:

 Hello -

 I'm curious what Hadoop developers use for their day-to-day hacking on
 Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not
 developing Map-Reduce jobs or using using the HDFS Client libraries to talk
 to a filesystem from an application.

 I've checked out Hadoop, made minor changes and built it with Maven, and
 tracked down the resulting artifacts in a target/ directory that I could
 deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, or
 are the IDEs more common?

 I realize this sort of sounds like a dumb question, but I'm mostly curious
 what I might be missing out on if I stay away from anything other than vim,
 and not being entirely sure where maven might be caching jars that it uses
 to build, and how careful I have to be to ensure that my changes wind up in
 the right places without having to do a clean build every time.

 Thanks!

 -Erik




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Hadoop build slaves software

2013-01-07 Thread Todd Lipcon
I'll install the right protoc and libstdc++ dev on asf009 as well.

-Todd

On Mon, Jan 7, 2013 at 9:57 AM, Andrew Wang andrew.w...@cloudera.com wrote:
 I think hadoop9 has a similar problem as hadoop8, based on a recent build.
 The javac output has a compile-proto error:

 https://builds.apache.org/job/PreCommit-HDFS-Build/3755/
 https://builds.apache.org/job/PreCommit-HDFS-Build/3755/artifact/trunk/patchprocess/trunkJavacWarnings.txt


 On Sun, Jan 6, 2013 at 1:57 AM, Binglin Chang decst...@gmail.com wrote:

 HAServiceProtocol.proto:21:8: Option java_generic_services unknown.
 This is probably caused by a older version of protoc in build env.


 On Sun, Jan 6, 2013 at 2:12 PM, Giridharan Kesavan 
 gkesa...@hortonworks.com
  wrote:

  by looking at the failure log :
 
 
 https://builds.apache.org/view/Hadoop/job/PreCommit-HADOOP-Build/1950/artifact/trunk/patchprocess/trunkJavacWarnings.txt
  build failed on
 
  [INFO] --- exec-maven-plugin:1.2:exec (compile-proto) @ hadoop-common ---
 
  HAServiceProtocol.proto:21:8: Option java_generic_services unknown.
 
  I'm not sure if this is something to do with the build env.
 
  -Giri
 
 
  On Sat, Jan 5, 2013 at 5:57 PM, Binglin Chang decst...@gmail.com
 wrote:
 
   I am not sure if this problem is solved, the build still failed in
   precommit-HADOOP
   https://builds.apache.org/view/Hadoop/job/PreCommit-HADOOP-Build/
  
  
   On Sat, Jan 5, 2013 at 6:46 AM, Giridharan Kesavan 
   gkesa...@hortonworks.com
wrote:
  
Marking the slave offline would do. I 've mared the hadoop8 slave
   offline,
while I test it for builds and bring it back online later when its
  good.
   
   
-Giri
   
   
On Fri, Jan 4, 2013 at 2:26 PM, Todd Lipcon t...@cloudera.com
 wrote:
   
 Turns out I had to both kill -9 it and chmod 000
 /home/jenkins/jenkins-slave in order to keep it from
 auto-respawning.
 Just a note so that once the toolchain is fixed, someone knows to
 re-chmod back to 755.

 -Todd

 On Fri, Jan 4, 2013 at 2:11 PM, Todd Lipcon t...@cloudera.com
  wrote:
  I'm going to kill -9 the jenkins slave on hadoop8 for now cuz
 it's
  causing havoc on the precommit builds. I can't see another way to
  administratively disable it from the Jenkins interface.
 
  Rajiv, Giri -- mind if I build/install protoc into /usr/local to
   match
  the other slaves? We can continue the conversation about
  provisioning
  after, but would like to unblock the builds in the meantime.
 
  As for CentOS vs Ubuntu, I've got no preference. RHEL6 is
 probably
  preferable since it's a more common install platform, anyway.
 But,
  we'll still need to have a custom toolchain for things like
 protoc
   2.4
  which don't have new enough versions in the package repos.
 
  -Todd
 
  On Fri, Jan 4, 2013 at 2:03 PM, Colin McCabe 
  cmcc...@alumni.cmu.edu
   
 wrote:
  In addition to protoc, can someone please also install a 32-bit
  C++
 compiler?
 
  The builds are all failing on this machine because of that.
 
  regards,
  Colin
 
 
  On Fri, Jan 4, 2013 at 11:37 AM, Giridharan Kesavan
  gkesa...@hortonworks.com wrote:
  When I configured the other machines I used the source to
 compile
   and
  install the protoc, as the 2.4.1 wasn't available in the ubuntu
   repo.
 
  BTW installed 2.4.1 on asf008.
  gkesavan@asf008:~$ protoc --version
  libprotoc 2.4.1
 
 
  -Giri
 
 
  On Thu, Jan 3, 2013 at 11:24 PM, Todd Lipcon 
 t...@cloudera.com
 wrote:
 
  Hey folks,
 
  It looks like hadoop8 has recently come back online as a build
slave,
  but is failing all the builds because it has an ancient
 version
  of
  protobuf (2.2.0):
  todd@asf008:~$ protoc  --version
  libprotoc 2.2.0
 
  In contrast, other slaves have 2.4.1:
  todd@asf001:~$ protoc --version
  libprotoc 2.4.1
 
  asf001 has the newer protoc in /usr/local/bin but asf008 does
  not.
  Does anyone know how software is meant to be deployed on these
   build
  slaves? I'm happy to download and install protobuf 2.4.1 into
  /usr/local on asf008 if manual installation is the name of the
   game,
  but it seems like we should be doing something a little more
  reproducible than one-off builds by rando developers to manage
  our
  toolchain on the Jenkins slaves.
 
  -Todd
  --
  Todd Lipcon
  Software Engineer, Cloudera
 
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera



 --
 Todd Lipcon
 Software Engineer, Cloudera

   
  
 




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Hadoop build slaves software

2013-01-07 Thread Todd Lipcon
OK. FYI, installed protoc on asf009, and the g++-4.4-multilib
packages on both asf008 and asf009. Checked the hadoop pipes native
build and it passes now. Fingers crossed...

-Todd

On Mon, Jan 7, 2013 at 3:35 PM, Todd Lipcon t...@cloudera.com wrote:
 I'll install the right protoc and libstdc++ dev on asf009 as well.

 -Todd

 On Mon, Jan 7, 2013 at 9:57 AM, Andrew Wang andrew.w...@cloudera.com wrote:
 I think hadoop9 has a similar problem as hadoop8, based on a recent build.
 The javac output has a compile-proto error:

 https://builds.apache.org/job/PreCommit-HDFS-Build/3755/
 https://builds.apache.org/job/PreCommit-HDFS-Build/3755/artifact/trunk/patchprocess/trunkJavacWarnings.txt


 On Sun, Jan 6, 2013 at 1:57 AM, Binglin Chang decst...@gmail.com wrote:

 HAServiceProtocol.proto:21:8: Option java_generic_services unknown.
 This is probably caused by a older version of protoc in build env.


 On Sun, Jan 6, 2013 at 2:12 PM, Giridharan Kesavan 
 gkesa...@hortonworks.com
  wrote:

  by looking at the failure log :
 
 
 https://builds.apache.org/view/Hadoop/job/PreCommit-HADOOP-Build/1950/artifact/trunk/patchprocess/trunkJavacWarnings.txt
  build failed on
 
  [INFO] --- exec-maven-plugin:1.2:exec (compile-proto) @ hadoop-common ---
 
  HAServiceProtocol.proto:21:8: Option java_generic_services unknown.
 
  I'm not sure if this is something to do with the build env.
 
  -Giri
 
 
  On Sat, Jan 5, 2013 at 5:57 PM, Binglin Chang decst...@gmail.com
 wrote:
 
   I am not sure if this problem is solved, the build still failed in
   precommit-HADOOP
   https://builds.apache.org/view/Hadoop/job/PreCommit-HADOOP-Build/
  
  
   On Sat, Jan 5, 2013 at 6:46 AM, Giridharan Kesavan 
   gkesa...@hortonworks.com
wrote:
  
Marking the slave offline would do. I 've mared the hadoop8 slave
   offline,
while I test it for builds and bring it back online later when its
  good.
   
   
-Giri
   
   
On Fri, Jan 4, 2013 at 2:26 PM, Todd Lipcon t...@cloudera.com
 wrote:
   
 Turns out I had to both kill -9 it and chmod 000
 /home/jenkins/jenkins-slave in order to keep it from
 auto-respawning.
 Just a note so that once the toolchain is fixed, someone knows to
 re-chmod back to 755.

 -Todd

 On Fri, Jan 4, 2013 at 2:11 PM, Todd Lipcon t...@cloudera.com
  wrote:
  I'm going to kill -9 the jenkins slave on hadoop8 for now cuz
 it's
  causing havoc on the precommit builds. I can't see another way to
  administratively disable it from the Jenkins interface.
 
  Rajiv, Giri -- mind if I build/install protoc into /usr/local to
   match
  the other slaves? We can continue the conversation about
  provisioning
  after, but would like to unblock the builds in the meantime.
 
  As for CentOS vs Ubuntu, I've got no preference. RHEL6 is
 probably
  preferable since it's a more common install platform, anyway.
 But,
  we'll still need to have a custom toolchain for things like
 protoc
   2.4
  which don't have new enough versions in the package repos.
 
  -Todd
 
  On Fri, Jan 4, 2013 at 2:03 PM, Colin McCabe 
  cmcc...@alumni.cmu.edu
   
 wrote:
  In addition to protoc, can someone please also install a 32-bit
  C++
 compiler?
 
  The builds are all failing on this machine because of that.
 
  regards,
  Colin
 
 
  On Fri, Jan 4, 2013 at 11:37 AM, Giridharan Kesavan
  gkesa...@hortonworks.com wrote:
  When I configured the other machines I used the source to
 compile
   and
  install the protoc, as the 2.4.1 wasn't available in the ubuntu
   repo.
 
  BTW installed 2.4.1 on asf008.
  gkesavan@asf008:~$ protoc --version
  libprotoc 2.4.1
 
 
  -Giri
 
 
  On Thu, Jan 3, 2013 at 11:24 PM, Todd Lipcon 
 t...@cloudera.com
 wrote:
 
  Hey folks,
 
  It looks like hadoop8 has recently come back online as a build
slave,
  but is failing all the builds because it has an ancient
 version
  of
  protobuf (2.2.0):
  todd@asf008:~$ protoc  --version
  libprotoc 2.2.0
 
  In contrast, other slaves have 2.4.1:
  todd@asf001:~$ protoc --version
  libprotoc 2.4.1
 
  asf001 has the newer protoc in /usr/local/bin but asf008 does
  not.
  Does anyone know how software is meant to be deployed on these
   build
  slaves? I'm happy to download and install protobuf 2.4.1 into
  /usr/local on asf008 if manual installation is the name of the
   game,
  but it seems like we should be doing something a little more
  reproducible than one-off builds by rando developers to manage
  our
  toolchain on the Jenkins slaves.
 
  -Todd
  --
  Todd Lipcon
  Software Engineer, Cloudera
 
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera



 --
 Todd Lipcon
 Software Engineer, Cloudera

Re: Hadoop build slaves software

2013-01-07 Thread Todd Lipcon
It was missing the c++ libraries though:
https://builds.apache.org/job/PreCommit-HADOOP-Build/2002/artifact/trunk/patchprocess/patchJavacWarnings.txt

-Todd

On Mon, Jan 7, 2013 at 3:46 PM, Giridharan Kesavan
gkesa...@hortonworks.com wrote:
 I did install protoc on hadoop9 and brought it back online after testing it
 couple of hours back.


 -Giri


 On Mon, Jan 7, 2013 at 3:35 PM, Todd Lipcon t...@cloudera.com wrote:

 I'll install the right protoc and libstdc++ dev on asf009 as well.

 -Todd

 On Mon, Jan 7, 2013 at 9:57 AM, Andrew Wang andrew.w...@cloudera.com
 wrote:
  I think hadoop9 has a similar problem as hadoop8, based on a recent
 build.
  The javac output has a compile-proto error:
 
  https://builds.apache.org/job/PreCommit-HDFS-Build/3755/
 
 https://builds.apache.org/job/PreCommit-HDFS-Build/3755/artifact/trunk/patchprocess/trunkJavacWarnings.txt
 
 
  On Sun, Jan 6, 2013 at 1:57 AM, Binglin Chang decst...@gmail.com
 wrote:
 
  HAServiceProtocol.proto:21:8: Option java_generic_services unknown.
  This is probably caused by a older version of protoc in build env.
 
 
  On Sun, Jan 6, 2013 at 2:12 PM, Giridharan Kesavan 
  gkesa...@hortonworks.com
   wrote:
 
   by looking at the failure log :
  
  
 
 https://builds.apache.org/view/Hadoop/job/PreCommit-HADOOP-Build/1950/artifact/trunk/patchprocess/trunkJavacWarnings.txt
   build failed on
  
   [INFO] --- exec-maven-plugin:1.2:exec (compile-proto) @ hadoop-common
 ---
  
   HAServiceProtocol.proto:21:8: Option java_generic_services unknown.
  
   I'm not sure if this is something to do with the build env.
  
   -Giri
  
  
   On Sat, Jan 5, 2013 at 5:57 PM, Binglin Chang decst...@gmail.com
  wrote:
  
I am not sure if this problem is solved, the build still failed in
precommit-HADOOP
https://builds.apache.org/view/Hadoop/job/PreCommit-HADOOP-Build/
   
   
On Sat, Jan 5, 2013 at 6:46 AM, Giridharan Kesavan 
gkesa...@hortonworks.com
 wrote:
   
 Marking the slave offline would do. I 've mared the hadoop8 slave
offline,
 while I test it for builds and bring it back online later when its
   good.


 -Giri


 On Fri, Jan 4, 2013 at 2:26 PM, Todd Lipcon t...@cloudera.com
  wrote:

  Turns out I had to both kill -9 it and chmod 000
  /home/jenkins/jenkins-slave in order to keep it from
  auto-respawning.
  Just a note so that once the toolchain is fixed, someone knows
 to
  re-chmod back to 755.
 
  -Todd
 
  On Fri, Jan 4, 2013 at 2:11 PM, Todd Lipcon t...@cloudera.com
   wrote:
   I'm going to kill -9 the jenkins slave on hadoop8 for now cuz
  it's
   causing havoc on the precommit builds. I can't see another
 way to
   administratively disable it from the Jenkins interface.
  
   Rajiv, Giri -- mind if I build/install protoc into /usr/local
 to
match
   the other slaves? We can continue the conversation about
   provisioning
   after, but would like to unblock the builds in the meantime.
  
   As for CentOS vs Ubuntu, I've got no preference. RHEL6 is
  probably
   preferable since it's a more common install platform, anyway.
  But,
   we'll still need to have a custom toolchain for things like
  protoc
2.4
   which don't have new enough versions in the package repos.
  
   -Todd
  
   On Fri, Jan 4, 2013 at 2:03 PM, Colin McCabe 
   cmcc...@alumni.cmu.edu

  wrote:
   In addition to protoc, can someone please also install a
 32-bit
   C++
  compiler?
  
   The builds are all failing on this machine because of that.
  
   regards,
   Colin
  
  
   On Fri, Jan 4, 2013 at 11:37 AM, Giridharan Kesavan
   gkesa...@hortonworks.com wrote:
   When I configured the other machines I used the source to
  compile
and
   install the protoc, as the 2.4.1 wasn't available in the
 ubuntu
repo.
  
   BTW installed 2.4.1 on asf008.
   gkesavan@asf008:~$ protoc --version
   libprotoc 2.4.1
  
  
   -Giri
  
  
   On Thu, Jan 3, 2013 at 11:24 PM, Todd Lipcon 
  t...@cloudera.com
  wrote:
  
   Hey folks,
  
   It looks like hadoop8 has recently come back online as a
 build
 slave,
   but is failing all the builds because it has an ancient
  version
   of
   protobuf (2.2.0):
   todd@asf008:~$ protoc  --version
   libprotoc 2.2.0
  
   In contrast, other slaves have 2.4.1:
   todd@asf001:~$ protoc --version
   libprotoc 2.4.1
  
   asf001 has the newer protoc in /usr/local/bin but asf008
 does
   not.
   Does anyone know how software is meant to be deployed on
 these
build
   slaves? I'm happy to download and install protobuf 2.4.1
 into
   /usr/local on asf008 if manual installation is the name of
 the
game,
   but it seems like we should be doing something a little
 more
   reproducible than one-off

[jira] [Created] (HADOOP-9150) Unnecessary DNS resolution attempts for logical URIs

2012-12-17 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9150:
---

 Summary: Unnecessary DNS resolution attempts for logical URIs
 Key: HADOOP-9150
 URL: https://issues.apache.org/jira/browse/HADOOP-9150
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Critical


In the FileSystem code, we accidentally try to DNS-resolve the logical name 
before it is converted to an actual domain name. In some DNS setups, this can 
cause a big slowdown - eg in one misconfigured cluster we saw a 2-3x drop in 
terasort throughput, since every task wasted a lot of time waiting for slow 
not found responses from DNS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Unstuck QA bot

2012-12-12 Thread Todd Lipcon
The QA bot was previously pointing at a JIRA filter which matched all
Patch Available issues. That number has grown to 300 across the many
projects that the QA bot works on. Unfortunately, that meant that a
lot of the newer JIRAs (especially in alphabetically lower projects
like HBASE, HADOOP, and GIRAPH) weren't getting picked up. I duped the
filter, restricted it to those PA issues updated in the last 2 weeks,
and changed the jenkins PreCommit-Admin job to point to the new one.
It seems to have now unstuck itself and is processing a bunch of
attachments that it had fallen behind on.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-9112) test-patch should -1 for @Tests without a timeout

2012-12-03 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9112:
---

 Summary: test-patch should -1 for @Tests without a timeout
 Key: HADOOP-9112
 URL: https://issues.apache.org/jira/browse/HADOOP-9112
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Todd Lipcon


With our current test running infrastructure, if a test with no timeout set 
runs too long, it triggers a surefire-wide timeout, which for some reason 
doesn't show up as a failed test in the test-patch output. Given that, we 
should require that all tests have a timeout set, and have test-patch enforce 
this with a simple check

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9106) Allow configuration of IPC connect timeout

2012-11-29 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9106:
---

 Summary: Allow configuration of IPC connect timeout
 Key: HADOOP-9106
 URL: https://issues.apache.org/jira/browse/HADOOP-9106
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Currently the connection timeout in Client.setupConnection() is hard coded to 
20seconds. This is unreasonable in some scenarios, such as HA failover, if we 
want a faster failover time. We should allow this to be configured per-client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Why don't the ipc Server use ArrayBlockingQueue for callQueue?

2012-10-31 Thread Todd Lipcon
Hi Luoli

Why would it be more efficient? When suggesting an improvement, it would be
good to back it up with your reasoning.

-Todd

On Wed, Oct 31, 2012 at 1:19 AM, 罗李 luoli...@gmail.com wrote:

 hi everybody:
 I have a little question, why don't the ipc Server in hadoop
 use ArrayBlockingQueue for the callQueue but use LinkedBlockingQueue? Will
 it be more efficiently?

 thanks

 luoli




-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-8929) Add toString for SampleQuantiles

2012-10-15 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8929:
---

 Summary: Add toString for SampleQuantiles
 Key: HADOOP-8929
 URL: https://issues.apache.org/jira/browse/HADOOP-8929
 Project: Hadoop Common
  Issue Type: Improvement
  Components: metrics
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Todd Lipcon


The new SampleQuantiles class is useful in the context of benchmarks, but 
currently there is no way to print it out outside the context of a metrics 
sink. It would be nice to have a convenient way to stringify it for logging, 
etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8905) Add metrics for HTTP Server

2012-10-09 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8905:
---

 Summary: Add metrics for HTTP Server
 Key: HADOOP-8905
 URL: https://issues.apache.org/jira/browse/HADOOP-8905
 Project: Hadoop Common
  Issue Type: Improvement
  Components: metrics
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Currently we don't expose any metrics about the HTTP server. It would be useful 
to be able to monitor the following:
- Number of threads currently actively serving servlet requests
- Total number of requests served
- Perhaps break down time/count by endpoint (eg /jmx, /conf, various JSPs)

This becomes more important as http-based protocols like webhdfs become more 
common

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8889) Upgrade to Surefire 2.12.3

2012-10-05 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8889:
---

 Summary: Upgrade to Surefire 2.12.3
 Key: HADOOP-8889
 URL: https://issues.apache.org/jira/browse/HADOOP-8889
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build, test
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hadoop-8889.txt

Surefire 2.12.3 has a couple improvements which are helpful for us. In 
particular, it fixes http://jira.codehaus.org/browse/SUREFIRE-817 which has 
been aggravating in the past.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8894) GenericTestUtils.waitFor should dump thread stacks on timeout

2012-10-05 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8894:
---

 Summary: GenericTestUtils.waitFor should dump thread stacks on 
timeout
 Key: HADOOP-8894
 URL: https://issues.apache.org/jira/browse/HADOOP-8894
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Many tests use this utility to wait for a condition to become true. In the 
event that it times out, we should dump all the thread stack traces, in case 
the timeout was due to a deadlock. This should make it easier to debug 
scenarios like HDFS-4001.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8855) SSL-based image transfer does not work when Kerberos is disabled

2012-09-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8855:
---

 Summary: SSL-based image transfer does not work when Kerberos is 
disabled
 Key: HADOOP-8855
 URL: https://issues.apache.org/jira/browse/HADOOP-8855
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 3.0.0, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


In SecurityUtil.openSecureHttpConnection, we first check 
{{UserGroupInformation.isSecurityEnabled()}}. However, this only checks the 
kerberos config, which is independent of {{hadoop.ssl.enabled}}. Instead, we 
should check {{HttpConfig.isSecure()}}.

Credit to Wing Yew Poon for discovering this bug

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Test timeouts

2012-09-14 Thread Todd Lipcon
On Fri, Sep 14, 2012 at 11:23 AM, Aaron T. Myers a...@cloudera.com wrote:

 So, if you see a test case fail by reaching the Surefire fork timeout,
 please file a JIRA to add a JUnit timeout for that test. If when adding a
 test case you think that it might time out, please add a JUnit timeout.

I'd go one step further:
If you add any test that isn't a true unit test (ie it relies on any
mulithreading, either explicitly or by using miniclusters, etc), you
should add a timeout. Even if it has to be conservative (like 5
minutes on a test that you expect only runs 10 seconds), it seems
better than waiting for it to timeout and then later having to go back
and add one.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init

2012-09-10 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8786:
---

 Summary: HttpServer continues to start even if 
AuthenticationFilter fails to init
 Key: HADOOP-8786
 URL: https://issues.apache.org/jira/browse/HADOOP-8786
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.1-alpha, 1.2.0, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the web 
server will continue to start up. We need to check for context initialization 
errors after starting the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8757) Metrics should disallow names with invalid characters

2012-08-31 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8757:
---

 Summary: Metrics should disallow names with invalid characters
 Key: HADOOP-8757
 URL: https://issues.apache.org/jira/browse/HADOOP-8757
 Project: Hadoop Common
  Issue Type: Improvement
  Components: metrics
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor


Just spent a couple hours trying to figure out why a metric I added didn't show 
up in JMX, only to eventually realize it was because I had a whitespace in the 
property name. This didn't cause any errors to be logged -- the metric just 
didn't show up in JMX. We should check that the name is valid and log an error, 
or replace invalid characters with something like an underscore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-8031) Configuration class fails to find embedded .jar resources; should use URL.openStream()

2012-08-30 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8031.
-

Resolution: Fixed

Re-resolving this since Ahmed is addressing my issue in HADOOP-8749

 Configuration class fails to find embedded .jar resources; should use 
 URL.openStream()
 --

 Key: HADOOP-8031
 URL: https://issues.apache.org/jira/browse/HADOOP-8031
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.0.0-alpha
Reporter: Elias Ross
Assignee: Elias Ross
 Fix For: 2.2.0-alpha

 Attachments: 0001-fix-HADOOP-7982-class-loader.patch, 
 HADOOP-8031-part2.patch, HADOOP-8031.patch, hadoop-8031.txt


 While running a hadoop client within RHQ (monitoring software) using its 
 classloader, I see this:
 2012-02-07 09:15:25,313 INFO  [ResourceContainer.invoker.daemon-2] 
 (org.apache.hadoop.conf.Configuration)- parsing 
 jar:file:/usr/local/rhq-agent/data/tmp/rhq-hadoop-plugin-4.3.0-SNAPSHOT.jar6856622641102893436.classloader/hadoop-core-0.20.2+737+1.jar7204287718482036191.tmp!/core-default.xml
 2012-02-07 09:15:25,318 ERROR [InventoryManager.discovery-1] 
 (rhq.core.pc.inventory.InventoryManager)- Failed to start component for 
 Resource[id=16290, type=NameNode, key=NameNode:/usr/lib/hadoop-0.20, 
 name=NameNode, parent=vg61l01ad-hadoop002.apple.com] from synchronized merge.
 org.rhq.core.clientapi.agent.PluginContainerException: Failed to start 
 component for resource Resource[id=16290, type=NameNode, 
 key=NameNode:/usr/lib/hadoop-0.20, name=NameNode, 
 parent=vg61l01ad-hadoop002.apple.com].
 Caused by: java.lang.RuntimeException: core-site.xml not found
   at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
   at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1228)
   at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1169)
   at org.apache.hadoop.conf.Configuration.set(Configuration.java:438)
 This is because the URL
 jar:file:/usr/local/rhq-agent/data/tmp/rhq-hadoop-plugin-4.3.0-SNAPSHOT.jar6856622641102893436.classloader/hadoop-core-0.20.2+737+1.jar7204287718482036191.tmp!/core-default.xml
 cannot be found by DocumentBuilder (doesn't understand it). (Note: the logs 
 are for an old version of Configuration class, but the new version has the 
 same code.)
 The solution is to obtain the resource stream directly from the URL object 
 itself.
 That is to say:
 {code}
  URL url = getResource((String)name);
 -if (url != null) {
 -  if (!quiet) {
 -LOG.info(parsing  + url);
 -  }
 -  doc = builder.parse(url.toString());
 -}
 +doc = builder.parse(url.openStream());
 {code}
 Note: I have a full patch pending approval at Apple for this change, 
 including some cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HADOOP-8031) Configuration class fails to find embedded .jar resources; should use URL.openStream()

2012-08-29 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HADOOP-8031:
-


I confirmed that reverting this patch locally restored the old behavior.

If we can't maintain the old behavior, we should at least mark this as an 
incompatible change. But I bet it's doable to both fix it and have relative 
xincludes.

 Configuration class fails to find embedded .jar resources; should use 
 URL.openStream()
 --

 Key: HADOOP-8031
 URL: https://issues.apache.org/jira/browse/HADOOP-8031
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.0.0-alpha
Reporter: Elias Ross
Assignee: Elias Ross
 Fix For: 2.2.0-alpha

 Attachments: 0001-fix-HADOOP-7982-class-loader.patch, 
 HADOOP-8031.patch, hadoop-8031.txt


 While running a hadoop client within RHQ (monitoring software) using its 
 classloader, I see this:
 2012-02-07 09:15:25,313 INFO  [ResourceContainer.invoker.daemon-2] 
 (org.apache.hadoop.conf.Configuration)- parsing 
 jar:file:/usr/local/rhq-agent/data/tmp/rhq-hadoop-plugin-4.3.0-SNAPSHOT.jar6856622641102893436.classloader/hadoop-core-0.20.2+737+1.jar7204287718482036191.tmp!/core-default.xml
 2012-02-07 09:15:25,318 ERROR [InventoryManager.discovery-1] 
 (rhq.core.pc.inventory.InventoryManager)- Failed to start component for 
 Resource[id=16290, type=NameNode, key=NameNode:/usr/lib/hadoop-0.20, 
 name=NameNode, parent=vg61l01ad-hadoop002.apple.com] from synchronized merge.
 org.rhq.core.clientapi.agent.PluginContainerException: Failed to start 
 component for resource Resource[id=16290, type=NameNode, 
 key=NameNode:/usr/lib/hadoop-0.20, name=NameNode, 
 parent=vg61l01ad-hadoop002.apple.com].
 Caused by: java.lang.RuntimeException: core-site.xml not found
   at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
   at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1228)
   at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1169)
   at org.apache.hadoop.conf.Configuration.set(Configuration.java:438)
 This is because the URL
 jar:file:/usr/local/rhq-agent/data/tmp/rhq-hadoop-plugin-4.3.0-SNAPSHOT.jar6856622641102893436.classloader/hadoop-core-0.20.2+737+1.jar7204287718482036191.tmp!/core-default.xml
 cannot be found by DocumentBuilder (doesn't understand it). (Note: the logs 
 are for an old version of Configuration class, but the new version has the 
 same code.)
 The solution is to obtain the resource stream directly from the URL object 
 itself.
 That is to say:
 {code}
  URL url = getResource((String)name);
 -if (url != null) {
 -  if (!quiet) {
 -LOG.info(parsing  + url);
 -  }
 -  doc = builder.parse(url.toString());
 -}
 +doc = builder.parse(url.openStream());
 {code}
 Note: I have a full patch pending approval at Apple for this change, 
 including some cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8624) ProtobufRpcEngine should log all RPCs if TRACE logging is enabled

2012-07-25 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8624:
---

 Summary: ProtobufRpcEngine should log all RPCs if TRACE logging is 
enabled
 Key: HADOOP-8624
 URL: https://issues.apache.org/jira/browse/HADOOP-8624
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Affects Versions: 3.0.0, 2.2.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


Since all RPC requests/responses are now ProtoBufs, it's easy to add a TRACE 
level logging output for ProtobufRpcEngine that actually shows the full content 
of all calls. This is very handy especially when writing/debugging unit tests, 
but might also be useful to enable at runtime for short periods of time to 
debug certain production issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8608) Add Configuration API for parsing time durations

2012-07-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8608:
---

 Summary: Add Configuration API for parsing time durations
 Key: HADOOP-8608
 URL: https://issues.apache.org/jira/browse/HADOOP-8608
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Hadoop has a lot of configurations which specify durations or intervals of 
time. Unfortunately these different configurations have little consistency in 
units - eg some are in milliseconds, some in seconds, and some in minutes. This 
makes it difficult for users to configure, since they have to always refer back 
to docs to remember the unit for each property.

The proposed solution is to add an API like {{Configuration.getTimeDuration}} 
which allows the user to specify the units with a prefix. For example, 10ms, 
10s, 10m, 10h, or even 10d. For backwards-compatibility, if the user 
does not specify a unit, the API can specify the default unit, and warn the 
user that they should specify an explicit unit instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8557) Core Test failed in jekins for patch pre-commit

2012-07-17 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8557.
-

Resolution: Duplicate

Resolving as dup of HADOOP-8537

 Core Test failed in jekins for patch pre-commit 
 

 Key: HADOOP-8557
 URL: https://issues.apache.org/jira/browse/HADOOP-8557
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Junping Du
Priority: Blocker

 In jenkins PreCommit build history 
 (https://builds.apache.org/job/PreCommit-HADOOP-Build/), following tests are 
 failed for all recently patches (build-1164,1166,1168,1170):
 org.apache.hadoop.ha.TestZKFailoverController.testGracefulFailover
 org.apache.hadoop.ha.TestZKFailoverController.testOneOfEverything 
 org.apache.hadoop.io.file.tfile.TestTFileByteArrays.testOneBlock 
 org.apache.hadoop.io.file.tfile.TestTFileByteArrays.testOneBlockPlusOneEntry  
 org.apache.hadoop.io.file.tfile.TestTFileByteArrays.testThreeBlocks 
 org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays.testOneBlock
  
 org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays.testOneBlockPlusOneEntry
  
 org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays.testThreeBlocks
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8423) MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data

2012-07-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8423.
-

   Resolution: Fixed
Fix Version/s: 1.2.0

Committed to branch-1 for 1.2. Thanks for backporting, Harsh.

 MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO 
 block-compressed data
 --

 Key: HADOOP-8423
 URL: https://issues.apache.org/jira/browse/HADOOP-8423
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.2
 Environment: Linux 2.6.32.23-0.3-default #1 SMP 2010-10-07 14:57:45 
 +0200 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Jason B
Assignee: Todd Lipcon
 Fix For: 1.2.0, 2.1.0-alpha

 Attachments: HADOOP-8423-branch-1.patch, HADOOP-8423-branch-1.patch, 
 MapFileCodecTest.java, hadoop-8423.txt


 I am using Cloudera distribution cdh3u1.
 When trying to check native codecs for better decompression
 performance such as Snappy or LZO, I ran into issues with random
 access using MapFile.Reader.get(key, value) method.
 First call of MapFile.Reader.get() works but a second call fails.
 Also  I am getting different exceptions depending on number of entries
 in a map file.
 With LzoCodec and 10 record file, jvm gets aborted.
 At the same time the DefaultCodec works fine for all cases, as well as
 record compression for the native codecs.
 I created a simple test program (attached) that creates map files
 locally with sizes of 10 and 100 records for three codecs: Default,
 Snappy, and LZO.
 (The test requires corresponding native library available)
 The summary of problems are given below:
 Map Size: 100
 Compression: RECORD
 ==
 DefaultCodec:  OK
 SnappyCodec: OK
 LzoCodec: OK
 Map Size: 10
 Compression: RECORD
 ==
 DefaultCodec:  OK
 SnappyCodec: OK
 LzoCodec: OK
 Map Size: 100
 Compression: BLOCK
 
 DefaultCodec:  OK
 SnappyCodec: java.io.EOFException  at
 org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114)
 LzoCodec: java.io.EOFException at
 org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114)
 Map Size: 10
 Compression: BLOCK
 ==
 DefaultCodec:  OK
 SnappyCodec: java.lang.NoClassDefFoundError: Ljava/lang/InternalError
 at 
 org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
 Method)
 LzoCodec:
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x2b068ffcbc00, pid=6385, tid=47304763508496
 #
 # JRE version: 6.0_21-b07
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b17 mixed mode linux-amd64 
 )
 # Problematic frame:
 # C  [liblzo2.so.2+0x13c00]  lzo1x_decompress+0x1a0
 #

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8590) Backport HADOOP-7318 (MD5Hash factory should reset the digester it returns) to branch-1

2012-07-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8590:
---

 Summary: Backport HADOOP-7318 (MD5Hash factory should reset the 
digester it returns) to branch-1
 Key: HADOOP-8590
 URL: https://issues.apache.org/jira/browse/HADOOP-8590
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 1.0.3
Reporter: Todd Lipcon


I ran into this bug on branch-1 today, it seems like we should backport it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Jetty fixes for Hadoop

2012-07-10 Thread Todd Lipcon
+1 from me too. We've had this in CDH since Sep '11 and been working
much better than the stock 6.1.26.

-Todd

On Tue, Jul 10, 2012 at 3:14 PM, Owen O'Malley omal...@apache.org wrote:
 On Tue, Jul 10, 2012 at 2:59 PM, Thomas Graves tgra...@yahoo-inc.comwrote:

 I'm +1 for adding it.


 I'm +1 also.

 -- Owen



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: PreCommit-Admin not running

2012-07-04 Thread Todd Lipcon
I reported it to builds@ earlier this week, seems to be running
sporadically now.

We had some of our build slaves dead due to the leap second bug. I did the
date trick to try to fix them, but maybe Hudson also needs a restart on
the slaves?

On Mon, Jul 2, 2012 at 7:31 PM, Jun Ping Du j...@vmware.com wrote:

 Move to dev alias, it seems to stop working since weekend.

 Thanks,

 Junping

 - Original Message -
 From: Kihwal Lee kih...@yahoo-inc.com
 To: gene...@hadoop.apache.org
 Sent: Tuesday, July 3, 2012 3:59:28 AM
 Subject: PreCommit-Admin not running

 It looks like the PreCommit-Admin build job is not running.
 Can anyone give it a gentle nudge?

 Kihwal




-- 
Todd Lipcon
Software Engineer, Cloudera


Scheduled jenkins builds not running?

2012-07-03 Thread Todd Lipcon
Our Precommit build (https://builds.apache.org/job/PreCommit-Admin/)
appears to have stopped running at 7/2 5PM. I've manually triggered it a
couple times since, but otherwise not running. Looking at the config, it
doesn't look like anyone explicitly disabled it.

Any idea what might be up? #asfinfra directed me to mail builds@

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-8537) Two TFile tests failing recently

2012-06-27 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8537:
---

 Summary: Two TFile tests failing recently
 Key: HADOOP-8537
 URL: https://issues.apache.org/jira/browse/HADOOP-8537
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 3.0.0
Reporter: Todd Lipcon


TestTFileJClassComparatorByteArrays and TestTFileByteArrays are failing in some 
recent patch builds (seems to have started in the middle of May). These tests 
previously failed in HADOOP-7111 - perhaps something regressed there?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8529) Error while formatting the namenode in hadoop single node setup in windows

2012-06-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8529.
-

Resolution: Invalid

Please inquire with the user mailing lists for questions like this. JIRA is 
meant for bug/task tracking.

 Error while formatting the namenode in hadoop single node setup in windows
 --

 Key: HADOOP-8529
 URL: https://issues.apache.org/jira/browse/HADOOP-8529
 Project: Hadoop Common
  Issue Type: Task
  Components: conf
Affects Versions: 1.0.3
 Environment: Windows XP using Cygwin
Reporter: Narayana Karteek
Priority: Blocker
 Attachments: capture8.bmp

   Original Estimate: 5h
  Remaining Estimate: 5h

 Hi,
   I tried to configure hadoop 1.0.3 .I added all libs from share folder 
 to lib directory.But still i get the error while formatting the 
  namenode
 $ ./hadoop namenode -format
 java.lang.NoClassDefFoundError:
 Caused by: java.lang.ClassNotFoundException:
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 .  Program will exit.in class:
 Exception in thread main

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8497) Shell needs a way to list amount of physical consumed space in a directory

2012-06-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8497:
---

 Summary: Shell needs a way to list amount of physical consumed 
space in a directory
 Key: HADOOP-8497
 URL: https://issues.apache.org/jira/browse/HADOOP-8497
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.0.0-alpha, 1.0.3, 3.0.0
Reporter: Todd Lipcon
Assignee: Andy Isaacson


Currently, there is no way to see the physical consumed space for a directory. 
du lists the logical (pre-replication) space, and fs -count only displays the 
consumed space when a quota is set. This makes it hard for administrators to 
set a quota on a directory, since they have no way to determine a reasonable 
value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8410) SPNEGO filter should have better error messages when not fully configured

2012-05-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8410:
---

 Summary: SPNEGO filter should have better error messages when not 
fully configured
 Key: HADOOP-8410
 URL: https://issues.apache.org/jira/browse/HADOOP-8410
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Priority: Minor


I upgraded to a build which includes SPNEGO, but neglected to configure  
dfs.web.authentication.kerberos.principal. This resulted in the following error:

12/05/18 14:46:20 INFO server.KerberosAuthenticationHandler: Login using keytab 
//home/todd/confs/conf.pseudo.security//hdfs.keytab, for principal 
${dfs.web.authentication.kerberos.principal}
12/05/18 14:46:20 WARN mortbay.log: failed SpnegoFilter: 
javax.servlet.ServletException: javax.security.auth.login.LoginException: 
Unable to obtain password from user

Instead, it should give an error that the principal needs to be configured. 
Even better would be if we could default to HTTP/_HOST@default realm

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8404) etc/hadoop in binary tarball missing hadoop-env.sh

2012-05-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8404:
---

 Summary: etc/hadoop in binary tarball missing hadoop-env.sh
 Key: HADOOP-8404
 URL: https://issues.apache.org/jira/browse/HADOOP-8404
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Todd Lipcon


todd@todd-w510:~/releases/hadoop-2.0.0-alpha$ ls etc/hadoop/
core-site.xml   hdfs-site.xmlhttpfs-signature.secret  
slaves  yarn-env.sh
hadoop-metrics2.properties  httpfs-env.shhttpfs-site.xml  
ssl-client.xml.example  yarn-site.xml
hadoop-metrics.properties   httpfs-log4j.properties  log4j.properties 
ssl-server.xml.example


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8405) ZKFC tests leak ZK instances

2012-05-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8405:
---

 Summary: ZKFC tests leak ZK instances
 Key: HADOOP-8405
 URL: https://issues.apache.org/jira/browse/HADOOP-8405
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, test
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hadoop-8405.txt

The ZKFC code wasn't previously terminating the ZK connection in all cases 
where it should (eg after a failed startup or after formatting ZK). This didn't 
cause a problem for CLI usage, since the process exited afterwards, but caused 
the test results to get clouded with a lot of Reconecting to ZK messages, 
which make the logs hard to read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8405) ZKFC tests leak ZK instances

2012-05-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8405.
-

   Resolution: Fixed
Fix Version/s: Auto Failover (HDFS-3042)
 Hadoop Flags: Reviewed

Committed to branch, thanks Eli.

 ZKFC tests leak ZK instances
 

 Key: HADOOP-8405
 URL: https://issues.apache.org/jira/browse/HADOOP-8405
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, test
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: Auto Failover (HDFS-3042)

 Attachments: hadoop-8405.txt


 The ZKFC code wasn't previously terminating the ZK connection in all cases 
 where it should (eg after a failed startup or after formatting ZK). This 
 didn't cause a problem for CLI usage, since the process exited afterwards, 
 but caused the test results to get clouded with a lot of Reconecting to ZK 
 messages, which make the logs hard to read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8406) CompressionCodecFactory.CODEC_PROVIDERS iteration is thread-unsafe

2012-05-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8406:
---

 Summary: CompressionCodecFactory.CODEC_PROVIDERS iteration is 
thread-unsafe
 Key: HADOOP-8406
 URL: https://issues.apache.org/jira/browse/HADOOP-8406
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


CompressionCodecFactory defines CODEC_PROVIDERS as:
{code}
  private static final ServiceLoaderCompressionCodec CODEC_PROVIDERS =
ServiceLoader.load(CompressionCodec.class);
{code}
but this is a lazy collection which is thread-unsafe to iterate. We either need 
to synchronize when we iterate over it, or we need to materialize it during 
class-loading time by copying to a non-lazy collection

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8276) Auto-HA: add config for java options to pass to zkfc daemon

2012-05-12 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8276.
-

   Resolution: Fixed
Fix Version/s: Auto Failover (HDFS-3042)
 Hadoop Flags: Reviewed

 Auto-HA: add config for java options to pass to zkfc daemon
 ---

 Key: HADOOP-8276
 URL: https://issues.apache.org/jira/browse/HADOOP-8276
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: Auto Failover (HDFS-3042)

 Attachments: hadoop-8276.txt


 Currently the zkfc daemon is started without any ability to specify java 
 options for it. We should be add a flag so heap size, etc can be specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8397) NPE thrown when IPC layer gets an EOF reading a response

2012-05-12 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8397:
---

 Summary: NPE thrown when IPC layer gets an EOF reading a response
 Key: HADOOP-8397
 URL: https://issues.apache.org/jira/browse/HADOOP-8397
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Priority: Critical


When making a call on an IPC connection where the other end has shut down, I 
see the following exception:
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781)
from the lines:
{code}
RpcResponseHeaderProto response = 
RpcResponseHeaderProto.parseDelimitedFrom(in);
int callId = response.getCallId();
{code}
This is because parseDelimitedFrom() returns null in the case that the next 
thing to be read on the stream is an EOF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Sailfish

2012-05-11 Thread Todd Lipcon
/
  
   Sailfish tries to improve Hadoop-performance, particularly for
  large-jobs
   which process TB's of data and run for hours.  In building Sailfish,
 we
   modify how map-output is handled and transported from map-reduce.
  
   The project pages provide more information about the project.
  
   We are looking for colloborators who can help get some of the ideas
  into
   Apache Hadoop. A possible step forward could be to make shuffle
  phase of
   Hadoop pluggable.
  
   If you are interested in working with us, please get in touch with
 me.
  
   Sriram
  
  
  
  
  --
  Best regards,
  
     - Andy
  
  Problems worthy of attack prove their worth by hitting back. - Piet
  Hein (via Tom White)
  
  
  
 




-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-8362) Improve exception message when Configuration.set() is called with a null key or value

2012-05-04 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8362:
---

 Summary: Improve exception message when Configuration.set() is 
called with a null key or value
 Key: HADOOP-8362
 URL: https://issues.apache.org/jira/browse/HADOOP-8362
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Priority: Trivial


Currently, calling Configuration.set(...) with a null value results in a 
NullPointerException within Properties.setProperty. We should check for null 
key/value and throw a better exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8344) Improve test-patch to make it easier to find javadoc warnings

2012-05-02 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8344:
---

 Summary: Improve test-patch to make it easier to find javadoc 
warnings
 Key: HADOOP-8344
 URL: https://issues.apache.org/jira/browse/HADOOP-8344
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build, test
Reporter: Todd Lipcon
Priority: Minor


Often I have to spend a lot of time digging through logs to find javadoc 
warnings as the result of a test-patch. Similar to the improvement made in 
HADOOP-8339, we should do the following:
- test-patch should only run javadoc on modules that have changed
- the exclusions OK_JAVADOC should be per-project rather than cross-project
- rather than just have a number, we should check in the actual list of 
warnings to ignore and then fuzzy-match the patch warnings against the exclude 
list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8279) Auto-HA: Allow manual failover to be invoked from zkfc.

2012-05-02 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8279.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch, thanks Aaron.

 Auto-HA: Allow manual failover to be invoked from zkfc.
 ---

 Key: HADOOP-8279
 URL: https://issues.apache.org/jira/browse/HADOOP-8279
 Project: Hadoop Common
  Issue Type: Improvement
  Components: auto-failover, ha
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Mingjie Lai
Assignee: Todd Lipcon
 Fix For: Auto Failover (HDFS-3042)

 Attachments: hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt, 
 hadoop-8279.txt, hadoop-8279.txt


 HADOOP-8247 introduces a configure flag to prevent potential status 
 inconsistency between zkfc and namenode, by making auto and manual failover 
 mutually exclusive.
 However, as described in 2.7.2 section of design doc at HDFS-2185, we should 
 allow manual and auto failover co-exist, by:
 - adding some rpc interfaces at zkfc
 - manual failover shall be triggered by haadmin, and handled by zkfc if auto 
 failover is enabled. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8340) SNAPSHOT build versions should compare as less than their eventual final release

2012-05-01 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8340:
---

 Summary: SNAPSHOT build versions should compare as less than their 
eventual final release
 Key: HADOOP-8340
 URL: https://issues.apache.org/jira/browse/HADOOP-8340
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


We recently added a utility function to compare two version strings, based on 
splitting on '.'s and comparing each component. However, it considers a version 
like 2.0.0-SNAPSHOT as being greater than 2.0.0. This isn't right, since 
SNAPSHOT builds come before the final release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8315) Support SASL-authenticated ZooKeeper in ActiveStandbyElector

2012-04-25 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8315:
---

 Summary: Support SASL-authenticated ZooKeeper in 
ActiveStandbyElector
 Key: HADOOP-8315
 URL: https://issues.apache.org/jira/browse/HADOOP-8315
 Project: Hadoop Common
  Issue Type: Improvement
  Components: auto-failover, ha
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon


Currently, if you try to use SASL-authenticated ZK with the 
ActiveStandbyElector, you run into a couple issues:
1) We hit ZOOKEEPER-1437 - we need to wait until we see SaslAuthenticated 
before we can make any requests
2) We currently throw a fatalError when we see the SaslAuthenticated callback 
on the connection watcher

We need to wait for ZK-1437 upstream, and then upgrade to the fixed version for 
#1. For #2 we just need to add a case there and ignore it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8306) ZKFC: improve error message when ZK is not running

2012-04-24 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8306.
-

   Resolution: Fixed
Fix Version/s: Auto Failover (HDFS-3042)
 Hadoop Flags: Reviewed

Committed to branch, thanks Eli

 ZKFC: improve error message when ZK is not running
 --

 Key: HADOOP-8306
 URL: https://issues.apache.org/jira/browse/HADOOP-8306
 Project: Hadoop Common
  Issue Type: Improvement
  Components: auto-failover, ha
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: Auto Failover (HDFS-3042)

 Attachments: hadoop-8306.txt


 Currently if you start the ZKFC without starting ZK, you get an ugly stack 
 trace. We should improve the error message and give it a unique exit code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8306) ZKFC: improve error message when ZK is not running

2012-04-23 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8306:
---

 Summary: ZKFC: improve error message when ZK is not running
 Key: HADOOP-8306
 URL: https://issues.apache.org/jira/browse/HADOOP-8306
 Project: Hadoop Common
  Issue Type: Improvement
  Components: auto-failover, ha
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


Currently if you start the ZKFC without starting ZK, you get an ugly stack 
trace. We should improve the error message and give it a unique exit code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8295) ToolRunner.confirmPrompt spins if stdin goes away

2012-04-19 Thread Todd Lipcon (Created) (JIRA)
ToolRunner.confirmPrompt spins if stdin goes away
-

 Key: HADOOP-8295
 URL: https://issues.apache.org/jira/browse/HADOOP-8295
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Priority: Minor


Currently, ToolRunner.confirmPrompt treats a -1 response to its 
System.in.read() call the same as \n or \r. So, if stdin goes away, it can 
cause the JVM to spin in a loop. Instead, it should treat a -1 result as a n 
response.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8258) Add interfaces for compression codecs to use direct byte buffers

2012-04-19 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8258.
-

Resolution: Duplicate

Resolving as dup - please see HADOOP-8148.

 Add interfaces for compression codecs to use direct byte buffers
 

 Key: HADOOP-8258
 URL: https://issues.apache.org/jira/browse/HADOOP-8258
 Project: Hadoop Common
  Issue Type: New Feature
  Components: io, native, performance
Affects Versions: 3.0.0
Reporter: Todd Lipcon

 Currently, the codec interface only provides input/output functions based on 
 byte arrays. Given that most of the codecs are implemented in native code, 
 this necessitates two extra copies - one to copy the input data to a direct 
 buffer, and one to copy the output data back to a byte array. We should add 
 interfaces to Decompressor/Compressor that can work directly with direct byte 
 buffers to avoid these copies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8276) Auto-HA: add config for java options to pass to zkfc daemon

2012-04-13 Thread Todd Lipcon (Created) (JIRA)
Auto-HA: add config for java options to pass to zkfc daemon
---

 Key: HADOOP-8276
 URL: https://issues.apache.org/jira/browse/HADOOP-8276
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


Currently the zkfc daemon is started without any ability to specify java 
options for it. We should be add a flag so heap size, etc can be specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8277) Auto-HA: add basic HTTP interface to ZKFC

2012-04-13 Thread Todd Lipcon (Created) (JIRA)
Auto-HA: add basic HTTP interface to ZKFC
-

 Key: HADOOP-8277
 URL: https://issues.apache.org/jira/browse/HADOOP-8277
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ha
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Currently the ZKFC exposes no interfaces at all. It would be useful to add a 
very basic web interface, which shows its current status. This would also 
expose the usual /jmx, /conf, etc servlets for easier debugging and monitoring.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8272) BytesWritable length problem

2012-04-12 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8272.
-

Resolution: Invalid

This is not a bug - this is the expected behavior of getBytes(). Please refer 
to the javadoc:
{code}

  /**
   * Get the data backing the BytesWritable. Please use {@link #copyBytes()}
   * if you need the returned array to be precisely the length of the data.
   * @return The data is only valid between 0 and getLength() - 1.
   */
{code}

 BytesWritable length problem
 

 Key: HADOOP-8272
 URL: https://issues.apache.org/jira/browse/HADOOP-8272
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.205.0
Reporter: Simon Gilliot
  Labels: hadoop
   Original Estimate: 1h
  Remaining Estimate: 1h

 I tried to create my own Writable which contains a BytesWritable.
 In my conctructor, I tried to create an empty BytesWritable :
 BytesWritable key = new BytesWritable();
 Next, in my readFields, I did :
 key.readFields(in); LOG.debug(Bytes.toString(key.getBytes()));
 The key contains much more bytes than I had wrote.
 In fact, if my BytesWritable contains 100 bytes, I thing that the 
 readFields() of BytesWritable call :
 * setSize(0) (which seems useless since the values in the old range are 
 preserved and any new values are undefined). * setSize(100) which extends the 
 bytes array (by setCapacity) to 1.5 * the size (so 150) without initalizing it
 * readFully(bytes, 0, 100) which fill the bytes array from '0' to '100' 
 offsets.
 And when I call getBytes() on it, the bytes array of 150 bytes is returned 
 without any control.
 That seems possible that the same problem happens in other conditions, when 
 we increase ths bytes array size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8247) Auto-HA: add a config to enable auto-HA, which disables manual FC

2012-04-10 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8247.
-

   Resolution: Fixed
Fix Version/s: Auto Failover (HDFS-3042)
 Hadoop Flags: Reviewed

Committed to branch, thx for reviews all.

 Auto-HA: add a config to enable auto-HA, which disables manual FC
 -

 Key: HADOOP-8247
 URL: https://issues.apache.org/jira/browse/HADOOP-8247
 Project: Hadoop Common
  Issue Type: Improvement
  Components: auto-failover, ha
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Auto Failover (HDFS-3042)

 Attachments: hadoop-8247.txt, hadoop-8247.txt, hadoop-8247.txt, 
 hadoop-8247.txt, hadoop-8247.txt, hadoop-8247.txt


 Currently, if automatic failover is set up and running, and the user uses the 
 haadmin -failover command, he or she can end up putting the system in an 
 inconsistent state, where the state in ZK disagrees with the actual state of 
 the world. To fix this, we should add a config flag which is used to enable 
 auto-HA. When this flag is set, we should disallow use of the haadmin command 
 to initiate failovers. We should refuse to run ZKFCs when the flag is not 
 set. Of course, this flag should be scoped by nameservice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8263) Stringification of IPC calls not useful

2012-04-09 Thread Todd Lipcon (Created) (JIRA)
Stringification of IPC calls not useful
---

 Key: HADOOP-8263
 URL: https://issues.apache.org/jira/browse/HADOOP-8263
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


Since the Protobufification of Hadoop, the log messages on IPC exceptions on 
the server side now read like:

12/04/09 16:04:06 INFO ipc.Server: IPC Server handler 9 on 8021, call 
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWritable@7087e9bf from 
127.0.0.1:47989: error: org.apache.hadoop.ipc.StandbyException: Operation 
category READ is not supported in state standby

The call should instead stringify the method name and the request protobuf 
(perhaps abbreviated if it is longer than a few hundred chars)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8262) Between mapper and reducer, Hadoop inserts spaces into my string

2012-04-08 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8262.
-

Resolution: Invalid

This sounds like a bug in your code, or a misunderstanding, rather than a bug 
in MR. Please feel free to contact the mapreduce-user list for help.

 Between mapper and reducer, Hadoop inserts spaces into my string
 

 Key: HADOOP-8262
 URL: https://issues.apache.org/jira/browse/HADOOP-8262
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.0
 Environment: Eclipse plugin, Windows
Reporter: Adriana Sbircea

 In the mapper i send as key a number, and as value another number which has 
 more than one digit, but i send them as Text objects. In my reducer all the 
 values for a key have spaces between every digit of a value. I can't do my 
 task because of this problem. 
 I don't use combiners or something else. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8260) Auto-HA: Replace ClientBaseWithFixes with our own modified copy of the class

2012-04-07 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8260.
-

   Resolution: Fixed
Fix Version/s: Auto Failover (HDFS-3042)
 Hadoop Flags: Reviewed

Committed to branch. I ran all of the tests which inherit from this class 
before committing.

 Auto-HA: Replace ClientBaseWithFixes with our own modified copy of the class
 

 Key: HADOOP-8260
 URL: https://issues.apache.org/jira/browse/HADOOP-8260
 Project: Hadoop Common
  Issue Type: Test
  Components: auto-failover, test
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: Auto Failover (HDFS-3042)

 Attachments: hadoop-8260.txt


 The class ClientBaseWithFixes is an attempt to add some workaround code to 
 avoid spurious failures due to ZOOKEEPER-1438. But, even after making those 
 workarounds, I've seen a few Jenkins failures due to that issue. Until ZK 
 fixes this issue, I'd like to just copy the test infrastructure into our own 
 code, and remove the problematic JMXEnv verifications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8257) Auto-HA: TestZKFailoverControllerStress occasionally fails with Mockito error

2012-04-06 Thread Todd Lipcon (Created) (JIRA)
Auto-HA: TestZKFailoverControllerStress occasionally fails with Mockito error
-

 Key: HADOOP-8257
 URL: https://issues.apache.org/jira/browse/HADOOP-8257
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, test
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial
 Attachments: hadoop-8257.txt

Once in a while I've seen the following in TestZKFailoverControllerStress:

 Unfinished stubbing detected here: - at 
org.apache.hadoop.ha.TestZKFailoverControllerStress.testRandomHealthAndDisconnects(TestZKFailoverControllerStress.java:118)
  E.g. thenReturn() may be missing

This is because we set up the mock answers _after_ starting the ZKFCs. So if 
the ZKFC calls the mock object while it's in the middle of the setup, this 
exception occurs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8257) Auto-HA: TestZKFailoverControllerStress occasionally fails with Mockito error

2012-04-06 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8257.
-

   Resolution: Fixed
Fix Version/s: Auto Failover (HDFS-3042)
 Hadoop Flags: Reviewed

 Auto-HA: TestZKFailoverControllerStress occasionally fails with Mockito error
 -

 Key: HADOOP-8257
 URL: https://issues.apache.org/jira/browse/HADOOP-8257
 Project: Hadoop Common
  Issue Type: Bug
  Components: auto-failover, test
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial
 Fix For: Auto Failover (HDFS-3042)

 Attachments: hadoop-8257.txt


 Once in a while I've seen the following in TestZKFailoverControllerStress:
  Unfinished stubbing detected here: - at 
 org.apache.hadoop.ha.TestZKFailoverControllerStress.testRandomHealthAndDisconnects(TestZKFailoverControllerStress.java:118)
   E.g. thenReturn() may be missing
 This is because we set up the mock answers _after_ starting the ZKFCs. So if 
 the ZKFC calls the mock object while it's in the middle of the setup, this 
 exception occurs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8258) Add interfaces for compression codecs to use direct byte buffers

2012-04-06 Thread Todd Lipcon (Created) (JIRA)
Add interfaces for compression codecs to use direct byte buffers


 Key: HADOOP-8258
 URL: https://issues.apache.org/jira/browse/HADOOP-8258
 Project: Hadoop Common
  Issue Type: New Feature
  Components: io, native
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Currently, the codec interface only provides input/output functions based on 
byte arrays. Given that most of the codecs are implemented in native code, this 
necessitates two extra copies - one to copy the input data to a direct buffer, 
and one to copy the output data back to a byte array. We should add interfaces 
to Decompressor/Compressor that can work directly with direct byte buffers to 
avoid these copies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8260) Auto-HA: Replace ClientBaseWithFixes with our own modified copy of the class

2012-04-06 Thread Todd Lipcon (Created) (JIRA)
Auto-HA: Replace ClientBaseWithFixes with our own modified copy of the class


 Key: HADOOP-8260
 URL: https://issues.apache.org/jira/browse/HADOOP-8260
 Project: Hadoop Common
  Issue Type: Test
  Components: auto-failover, test
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hadoop-8260.txt

The class ClientBaseWithFixes is an attempt to add some workaround code to 
avoid spurious failures due to ZOOKEEPER-1438. But, even after making those 
workarounds, I've seen a few Jenkins failures due to that issue. Until ZK fixes 
this issue, I'd like to just copy the test infrastructure into our own code, 
and remove the problematic JMXEnv verifications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8259) Auto-HA: Replace ClientBaseWithFixes with our own modified copy of the class

2012-04-06 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-8259.
-

Resolution: Duplicate

JIRA being funky today... clicked create once but got two copies. Dup of 
HADOOP-8260

 Auto-HA: Replace ClientBaseWithFixes with our own modified copy of the class
 

 Key: HADOOP-8259
 URL: https://issues.apache.org/jira/browse/HADOOP-8259
 Project: Hadoop Common
  Issue Type: Test
  Components: auto-failover, test
Affects Versions: Auto Failover (HDFS-3042)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor

 The class ClientBaseWithFixes is an attempt to add some workaround code to 
 avoid spurious failures due to ZOOKEEPER-1438. But, even after making those 
 workarounds, I've seen a few Jenkins failures due to that issue. Until ZK 
 fixes this issue, I'd like to just copy the test infrastructure into our own 
 code, and remove the problematic JMXEnv verifications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8251) SecurityUtil.fetchServiceTicket broken after HADOOP-6941

2012-04-05 Thread Todd Lipcon (Created) (JIRA)
SecurityUtil.fetchServiceTicket broken after HADOOP-6941


 Key: HADOOP-8251
 URL: https://issues.apache.org/jira/browse/HADOOP-8251
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 1.1.0, 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Attachments: hadoop-8251.txt

HADOOP-6941 replaced direct references to some classes with reflective access 
so as to support other JDKs. Unfortunately there was a mistake in the name of 
the Krb5Util class, which broke fetchServiceTicket. This manifests itself as 
the inability to run checkpoints or other krb5-SSL HTTP-based transfers:

java.lang.ClassNotFoundException: sun.security.jgss.krb5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   >