Re: data node reads blocks sequentially from disk

2020-03-11 Thread Todd Lipcon
We also issue explicit readahead via fadvise since 2011 or so, so the
typical io sizes hitting the device are large enough to max out the
throughput, at least for typical spinning disks.

Todd

On Wed, Mar 11, 2020, 9:46 AM Kihwal Lee 
wrote:

> When Datanode was initially designed, Linux AIO was still early in its
> adoption. Kernel support was there and the libraries were almost there. No
> java support, of course.  We would have to write a lot of native code for
> it and use JNI. Also, AIO means bypassing kernel page cache since you are
> doing it with O_DIRECT. We would have to implement some sort of block data
> caching on our own.
>
> Another option was to build an async framework in datanode. Instead, the
> community chose to use a pool of data transceiver threads to move forward
> fast.  There are some discussions and efforts to improve this, as the
> workload has changed since the early days.  However, the current way still
> utilizes io schedulers on block devices, so you will see a lot of io
> combining happening for typical loads.  These are not direct I/O, so
> read-ahead do happen and page cache is utilized.
>
> Kihwal
>
>
>
> On Wed, Mar 11, 2020 at 11:18 AM Wei-Chiu Chuang 
> wrote:
>
> > Hi David,
> > We talked a bit about a similar topic on DataNode sockets a while back.
> Any
> > feedback on the DataNode disk access?
> >
> > On Thu, Mar 5, 2020 at 4:16 PM Mania Abdi  wrote:
> >
> > > Hello everyone
> > >
> > > I have a question regarding HDFS, data node code version 2.7.2. I have
> > > posted my question as Jira issue
> > > .
> > >
> > > I have observed that datanode issues sequential synchronous 64KB reads
> to
> > > local disk and add then send it to user and wait for the
> acknowledgement
> > > from the user. I was wondering why HDFS community did not use file
> > mapping
> > > or asynchronous read from disk? This could allow disk scheduler to
> > perform
> > > sequential reads from disk or perform read-ahead and prefetching. Is it
> > > something that could lead to performance improvement or not.
> > >
> > > I would appreciate if you could help me to find the answer to this
> issue
> > > from Hadoop community
> > > perspective.
> > >
> > > I asked from apache members and they told me that the version I am
> > pointing
> > > to is old and this part of code written from scratch for modern SSDs.
> > Could
> > > you please help me to find at which version this modification happened?
> > and
> > > Where I can find it.
> > >
> > > Many thanks
> > > Mania
> > >
> >
>


[jira] [Resolved] (HDFS-14535) The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is causing lots of heap allocation in HBase when using short-circut read

2019-06-04 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-14535.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0

> The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is 
> causing lots of heap allocation in HBase when using short-circut read
> --
>
> Key: HDFS-14535
> URL: https://issues.apache.org/jira/browse/HDFS-14535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14535.patch
>
>
> Our HBase team are trying to read the blocks from HDFS into pooled offheap 
> ByteBuffers directly (HBASE-21879),  and recently we had some benchmark, 
> found that almost 45% heap allocation from the DFS client.   The heap 
> allocation flame graph can be see here: 
> https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
> After checking the code path,  we found that when requesting file descriptors 
> from a DomainPeer,  we allocated huge 8KB buffer for BufferedOutputStream, 
> though the protocal content was quite small and just few bytes.
> It made a heavy GC pressure for HBase when cacheHitRatio < 60%,  which 
> increased the HBase P999 latency.  Actually,  we can pre-allocate a small 
> buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read 
> the short-circuit fd protocal content.  we've created a patch like that, and 
> the allocation flame graph show that  after the patch, the heap allocation 
> from DFS client dropped from 45% to 27%, that's a very good thing  I think.  
> see: 
> https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
> Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,  
> HBase will benifit a lot from this. 
> Thanks. 
> For more details, can see here: 
> https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14482) Crash when using libhdfs with bad classpath

2019-05-14 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-14482.

Resolution: Fixed

> Crash when using libhdfs with bad classpath
> ---
>
> Key: HDFS-14482
> URL: https://issues.apache.org/jira/browse/HDFS-14482
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>    Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
>
> HDFS-14304 added a call to initCachedClasses in getJNIEnv after creating the 
> env but before checking whether it's null. In the case that getJNIEnv() fails 
> to create an env, it returns NULL, and then we crash when calling 
> initCachedClasses() on line 555
> {code}
> 551 state->env = getGlobalJNIEnv();
> 552 mutexUnlock(&jvmMutex);
> 553 
> 554 jthrowable jthr = NULL;
> 555 jthr = initCachedClasses(state->env);
> 556 if (jthr) {
> 557   printExceptionAndFree(state->env, jthr, PRINT_EXC_ALL,
> 558 "initCachedClasses failed");
> 559   goto fail;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14482) Crash when using libhdfs with bad classpath

2019-05-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-14482:
--

 Summary: Crash when using libhdfs with bad classpath
 Key: HDFS-14482
 URL: https://issues.apache.org/jira/browse/HDFS-14482
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Todd Lipcon
Assignee: Sahil Takiar


HDFS-14304 added a call to initCachedClasses in getJNIEnv after creating the 
env but before checking whether it's null. In the case that getJNIEnv() fails 
to create an env, it returns NULL, and then we crash when calling 
initCachedClasses() on line 555
{code}
551 state->env = getGlobalJNIEnv();
552 mutexUnlock(&jvmMutex);
553 
554 jthrowable jthr = NULL;
555 jthr = initCachedClasses(state->env);
556 if (jthr) {
557   printExceptionAndFree(state->env, jthr, PRINT_EXC_ALL,
558 "initCachedClasses failed");
559   goto fail;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2018-11-28 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-14111:
--

 Summary: hdfsOpenFile on HDFS causes unnecessary IO from file 
offset 0
 Key: HDFS-14111
 URL: https://issues.apache.org/jira/browse/HDFS-14111
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, libhdfs
Affects Versions: 3.2.0
Reporter: Todd Lipcon


hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
the read(0) isn't short circuited, and results in the DFSClient opening a block 
reader. In the case of a remote block, the block reader will actually issue a 
read of the whole block, causing the datanode to perform unnecessary IO and 
network transfers in order to fill up the client's TCP buffers. This causes 
performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10369) hdfsread crash when reading data reaches to 128M

2018-11-28 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-10369.

Resolution: Invalid

You're mallocing a buffer of 5 bytes here, seems your C code is just broken.

> hdfsread crash when reading data reaches to 128M
> 
>
> Key: HDFS-10369
> URL: https://issues.apache.org/jira/browse/HDFS-10369
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: vince zhang
>Priority: Major
>
> see code below, it would crash after   printf("hdfsGetDefaultBlockSize2:%d, 
> ret:%d\n", hdfsGetDefaultBlockSize(fs), ret);
>   
> hdfsFile read_file = hdfsOpenFile(fs, "/testpath", O_RDONLY, 0, 0, 1); 
>   int total = hdfsAvailable(fs, read_file);
>   printf("Total:%d\n", total);
>   char* buffer = (char*)malloc(sizeof(size+1) * sizeof(char));
>   int ret = -1; 
>   int len = 0;
>   ret = hdfsSeek(fs, read_file, 134152192);
>   printf("hdfsGetDefaultBlockSize1:%d, ret:%d\n", 
> hdfsGetDefaultBlockSize(fs), ret);
>   ret = hdfsRead(fs, read_file, (void*)buffer, size);
>   printf("hdfsGetDefaultBlockSize2:%d, ret:%d\n", 
> hdfsGetDefaultBlockSize(fs), ret);
>   ret = hdfsRead(fs, read_file, (void*)buffer, size);
>   printf("hdfsGetDefaultBlockSize3:%d, ret:%d\n", 
> hdfsGetDefaultBlockSize(fs), ret);
>   return 0;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] Hadoop RPC encryption performance improvements

2018-11-02 Thread Todd Lipcon
One possibility (which we use in Kudu) is to use SSL for encryption but
with a self-signed certificate, maintaining the existing SASL/GSSAPI
handshake for authentication. The one important bit here, security wise, is
to implement channel binding (RFC 5056 and RFC 5929) to prevent against
MITMs. The description of the Kudu protocol is here:
https://github.com/apache/kudu/blob/master/docs/design-docs/rpc.md#wire-protocol

If implemented correctly, this provides TLS encryption (with all of its
performance and security benefits) without requiring the user to deploy a
custom cert.

-Todd

On Thu, Nov 1, 2018 at 7:14 PM Konstantin Shvachko 
wrote:

> Hi Wei-Chiu,
>
> Thanks for starting the thread and summarizing the problem. Sorry for slow
> response.
> We've been looking at the encrypted performance as well and are interested
> in this effort.
> We ran some benchmarks locally. Our benchmarks also showed substantial
> penalty for turning on wire encryption on rpc.
> Although it was less drastic - more in the range of -40%. But we ran a
> different benchmark NNThroughputBenchmark, and we ran it on 2.6 last year.
> Could have published the results, but need to rerun on more recent
> versions.
>
> Three points from me on this discussion:
>
> 1. We should settle on the benchmarking tools.
> For development RPCCallBenchmark is good as it measures directly the
> improvement on the RPC layer. But for external consumption it is more
> important to know about e.g. NameNode RPCs performance. So we probably
> should run both benchmarks.
> 2. SASL vs SSL.
> Since current implementation is based on SASL, I think it would make sense
> to make improvements in this direction. I assume switching to SSL would
> require changes in configuration. Not sure if it will be compatible, since
> we don't have the details. At this point I would go with HADOOP-10768.
> Given all (Daryn's) concerns are addressed.
> 3. Performance improvement expectations.
> Ideally we want to have < 10% penalty for encrypted communication. Anything
> over 30% will probably have very limited usability. And there is the gray
> area in between, which could be mitigated by allowing mixed encrypted and
> un-encrypted RPCs on the single NameNode like in HDFS-13566.
>
> Thanks,
> --Konstantin
>
> On Wed, Oct 31, 2018 at 7:39 AM Daryn Sharp 
> wrote:
>
> > Various KMS tasks have been delaying my RPC encryption work – which is
> 2nd
> > on TODO list.  It's becoming a top priority for us so I'll try my best to
> > get a preliminary netty server patch (sans TLS) up this week if that
> helps.
> >
> > The two cited jiras had some critical flaws.  Skimming my comments, both
> > use blocking IO (obvious nonstarter).  HADOOP-10768 is a hand rolled
> > TLS-like encryption which I don't feel is something the community can or
> > should maintain from a security standpoint.
> >
> > Daryn
> >
> > On Wed, Oct 31, 2018 at 8:43 AM Wei-Chiu Chuang 
> > wrote:
> >
> > > Ping. Any one? Cloudera is interested in moving forward with the RPC
> > > encryption improvements, but I just like to get a consensus which
> > approach
> > > to go with.
> > >
> > > Otherwise I'll pick HADOOP-10768 since it's ready for commit, and I've
> > > spent time on testing it.
> > >
> > > On Thu, Oct 25, 2018 at 11:04 AM Wei-Chiu Chuang 
> > > wrote:
> > >
> > > > Folks,
> > > >
> > > > I would like to invite all to discuss the various Hadoop RPC
> encryption
> > > > performance improvements. As you probably know, Hadoop RPC encryption
> > > > currently relies on Java SASL, and have _really_ bad performance (in
> > > terms
> > > > of number of RPCs per second, around 15~20% of the one without SASL)
> > > >
> > > > There have been some attempts to address this, most notably,
> > HADOOP-10768
> > > > <https://issues.apache.org/jira/browse/HADOOP-10768> (Optimize
> Hadoop
> > > RPC
> > > > encryption performance) and HADOOP-13836
> > > > <https://issues.apache.org/jira/browse/HADOOP-13836> (Securing
> Hadoop
> > > RPC
> > > > using SSL). But it looks like both attempts have not been
> progressing.
> > > >
> > > > During the recent Hadoop contributor meetup, Daryn Sharp mentioned
> he's
> > > > working on another approach that leverages Netty for its SSL
> > encryption,
> > > > and then integrate Netty with Hadoop RPC so that Hadoop RPC
> > automatically
> > > > benefits from netty's SSL encryption performance.
> > > >
> > > > So there are at least 3 attempts to address this issue as I see it.
> Do
> > we
> > > > have a consensus that:
> > > > 1. this is an important problem
> > > > 2. which approach we want to move forward with
> > > >
> > > > --
> > > > A very happy Hadoop contributor
> > > >
> > >
> > >
> > > --
> > > A very happy Hadoop contributor
> > >
> >
> >
> > --
> >
> > Daryn
> >
>


-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-13826) Add a hidden configuration for NameNode to generate fake block locations

2018-08-14 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13826:
--

 Summary: Add a hidden configuration for NameNode to generate fake 
block locations
 Key: HDFS-13826
 URL: https://issues.apache.org/jira/browse/HDFS-13826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Todd Lipcon
Assignee: Todd Lipcon


In doing testing and benchmarking of the NameNode and dependent systems, it's 
often useful to be able to use an fsimage provided by some production system in 
a controlled environment without actually having access to any of the data. For 
example, while doing some recent work on Apache Impala I was trying to optimize 
the transmission and storage of block locations and tokens and measure the 
results based on metadata from a production user. In order to achieve this, it 
would be useful for the NN to expose a developer-only (undocumented) 
configuration to generate fake block locations and return them to callers. The 
"fake" locations should be randomly distributed across a fixed set of fake 
datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-07-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13747:
--

 Summary: Statistic for list_located_status is incremented 
incorrectly by listStatusIterator
 Key: HDFS-13747
 URL: https://issues.apache.org/jira/browse/HDFS-13747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.3
Reporter: Todd Lipcon






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS]: securing ASF Hadoop releases out of the box

2018-07-05 Thread Todd Lipcon
o here?
> >
> > Some things to think about
> >
> > * docs explaining IN CAPITAL LETTERS why you need to lock down your
> > cluster to a private subnet or use Kerberos
> > * Anything which can be done to make Kerberos easier (?). I see
> there are
> > some oustanding patches for HADOOP-12649 which need review, but what
> else?
> >
> > Could we have Hadoop determine when it's coming up on an open
> network and
> > start warning? And how?
> >
> > At the very least, single node hadoop should be locked down. You
> shouldn't
> > have to bring up kerberos to run it like that. And for more
> sophisticated
> > multinode deployments, should the scripts refuse to work without
> kerberos
> > unless you pass in some argument like "--Dinsecure-clusters-
> permitted"
> >
> > Any other ideas?
> >
> >
> > 
> -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
> >
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-13703) Avoid allocation of CorruptedBlocks hashmap when no corrupted blocks are hit

2018-06-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13703:
--

 Summary: Avoid allocation of CorruptedBlocks hashmap when no 
corrupted blocks are hit
 Key: HDFS-13703
 URL: https://issues.apache.org/jira/browse/HDFS-13703
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: performance
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The DFSClient creates a CorruptedBlocks object, which contains a HashMap, on 
every read call. In most cases, a read will not hit any corrupted blocks, and 
this hashmap is not used. It seems the JIT isn't smart enough to eliminate this 
allocation. We would be better off avoiding it and only allocating in the rare 
case when a corrupt block is hit.

Removing this allocation reduced CPU usage of a TeraValidate job by about 10%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13702) HTrace hooks taking 10-15% CPU in DFS client when disabled

2018-06-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13702:
--

 Summary: HTrace hooks taking 10-15% CPU in DFS client when disabled
 Key: HDFS-13702
 URL: https://issues.apache.org/jira/browse/HDFS-13702
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: performance
Affects Versions: 3.0.0
Reporter: Todd Lipcon


I am seeing DFSClient.newReaderTraceScope take ~15% CPU in a teravalidate 
workload even when HTrace is disabled. This is because it stringifies several 
integers. We should avoid all allocation and stringification when htrace is 
disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13701) Removal of logging guards regressed performance

2018-06-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13701:
--

 Summary: Removal of logging guards regressed performance
 Key: HDFS-13701
 URL: https://issues.apache.org/jira/browse/HDFS-13701
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: performance
Affects Versions: 3.0.0
Reporter: Todd Lipcon


HDFS-8971 removed various logging guards from hot methods in the DFS client. In 
theory using a format string with {} placeholders is equivalent, but in fact 
it's not equivalent when one or more of the variable arguments are primitives. 
To be passed as part of the varargs array, the primitives need to be boxed. I 
am seeing Integer.valueOf() inside BlockReaderLocal.read taking ~3% of CPU.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-3653) 1.x: Add a retention period for purged edit logs

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3653.
---
Resolution: Won't Fix

> 1.x: Add a retention period for purged edit logs
> 
>
> Key: HDFS-3653
> URL: https://issues.apache.org/jira/browse/HDFS-3653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 1.1.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> Occasionally we have a bug which causes something to go wrong with edits 
> files. Even more occasionally the bug is such that the namenode mistakenly 
> deletes an {{edits}} file without merging it into {{fsimage}} properly -- e.g 
> if the bug mistakenly writes an OP_INVALID at the top of the log.
> In trunk/2.0 we retain many edit log segments going back in time to be more 
> robust to this kind of error. I'd like to implement something similar (but 
> much simpler) in 1.x, which would be used only by HDFS developers in 
> root-causing or repairing from these rare scenarios: the NN should never 
> directly delete an edit log file. Instead, it should rename the file into 
> some kind of "trash" directory inside the name dir, and associate it with a 
> timestamp. Then, periodically a separate thread should scan the trash dirs 
> and delete any logs older than a configurable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-3069) If an edits file has more edits in it than expected by its name, should trigger an error

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3069.
---
  Resolution: Won't Fix
Target Version/s:   (was: )

> If an edits file has more edits in it than expected by its name, should 
> trigger an error
> 
>
> Key: HDFS-3069
> URL: https://issues.apache.org/jira/browse/HDFS-3069
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.0, 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> In testing what happens in HA split brain scenarios, I ended up with an edits 
> log that was named edits_47-47 but actually had two edits in it (#47 and 
> #48). The edits loading process should detect this situation and barf. 
> Otherwise, the problem shows up later during loading or even on the next 
> restart, and is tough to fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Datanode synchronization is horrible. I’m thinking we can use ReentrantReadWriteLock for synchronization. What do you guys think?

2015-02-17 Thread Todd Lipcon
You might also be interested in
https://issues.apache.org/jira/browse/HDFS-1148 which I worked on a bit a
number of years back. Per the last comment in that JIRA, I don't think it's
very valuable anymore given the predominance of short-circuit reads in high
performance workloads these days. If you've got some jstacks showing high
contention on these locks under some workload, though, it would be
interesting to see them.

-Todd

On Tue, Feb 17, 2015 at 2:18 PM, Colin P. McCabe  wrote:

> In general, the DN does not perform reads from files under a big lock.
> We only need the lock for protecting the replica map and some of the
> block state.  This lock hasn't really been a big problem in the past
> and I would hesitate to add complexity here (although I haven't
> thought about it that hard at all, so maybe I'm wrong!)
>
> Are you sure that you are not hitting HDFS-7489?
>
> In general, the client normally does some readahead of a few kb to
> avoid swamping the DN with tons of tiny requests.  Tons of tiny
> requests is a bad idea for many other reasons (RPC overhead, seek
> overhead, etc. etc.)
>
> You can also look into using short-circuit reads to avoid the DataNode
> overhead altogether for local reads, which a lot of high-performance
> systems do.
>
> regards,
> Colin
>
> On Sat, Feb 14, 2015 at 10:43 PM, Sukunhui (iEBP) 
> wrote:
> > I have a cluster writes/reads/deletes lots of small files.
> > I dump the stack of one Datenode and found out that Datanode has more
> than 100+ sessions for reading/writing blocks. 100+ DataXceiver threads
> waiting to lock <0x7f9b26ce9530> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> >
> > I find that DsDatasetImpl.java and ReplicaMap.java use a lot of
> `synchronized` keyword for synchronization. It’s horrible.
> > First, locking for every reading is unnecessary, and deceases
> concurrency.
> > Second, Java monitors (synchronized/await/notify/notifyAll) are
> non-fair. (
> http://stackoverflow.com/questions/11275699/synchronized-release-order),
> It will causes many dfsclient timeout.
> >
> > I’m thinking we can use ReentrantReadWriteLock for synchronization. What
> do you guys think?
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Resolved] (HDFS-957) FSImage layout version should be only once file is complete

2015-02-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-957.
--
Resolution: Won't Fix

> FSImage layout version should be only once file is complete
> ---
>
> Key: HDFS-957
> URL: https://issues.apache.org/jira/browse/HDFS-957
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-957.txt
>
>
> Right now, the FSImage save code writes the LAYOUT_VERSION at the head of the 
> file, along with some other headers, and then dumps the directory into the 
> file. Instead, it should write a special IMAGE_IN_PROGRESS entry for the 
> layout version, dump all of the data, then seek back to the head of the file 
> to write the proper LAYOUT_VERSION. This would make it very easy to detect 
> the case where the FSImage save got interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3528) Use native CRC32 in DFS write path

2014-08-28 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3528.
---

   Resolution: Fixed
Fix Version/s: 2.6.0

> Use native CRC32 in DFS write path
> --
>
> Key: HDFS-3528
> URL: https://issues.apache.org/jira/browse/HDFS-3528
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client, performance
>Affects Versions: 2.0.0-alpha
>    Reporter: Todd Lipcon
>Assignee: James Thomas
> Fix For: 2.6.0
>
>
> HDFS-2080 improved the CPU efficiency of the read path by using native 
> SSE-enabled code for CRC verification. Benchmarks of the write path show that 
> it's often CPU bound by checksums as well, so we should make the same 
> improvement there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Updates on migration to git

2014-08-26 Thread Todd Lipcon
t; >> > > > 1. Since no one expressed any reservations against doing
> > this
> > > >> on
> > > >> > >> Sunday
> > > >> > >> > > or
> > > >> > >> > > > renaming trunk to master, I ll go ahead and confirm
> that. I
> > > >> think
> > > >> > >> that
> > > >> > >> > > > serves us better in the long run.
> > > >> > >> > > >
> > > >> > >> > > > 2. Arpit brought up the precommit builds - we should
> > > definitely
> > > >> > fix
> > > >> > >> > them
> > > >> > >> > > as
> > > >> > >> > > > soon as we can. I understand Giri maintains those builds,
> > do
> > > we
> > > >> > have
> > > >> > >> > > anyone
> > > >> > >> > > > else who has access in case Giri is not reachable? Giri -
> > > >> please
> > > >> > >> shout
> > > >> > >> > > out
> > > >> > >> > > > if you can help us with this either on Sunday or Monday.
> > > >> > >> > > >
> > > >> > >> > > > Thanks
> > > >> > >> > > > Karthik
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > > On Fri, Aug 22, 2014 at 3:50 PM, Karthik Kambatla <
> > > >> > >> ka...@cloudera.com
> > > >> >
> > > >> > >> > > > wrote:
> > > >> > >> > > >
> > > >> > >> > > > > Also, does anyone know what we use for integration
> > between
> > > >> JIRA
> > > >> > >> and
> > > >> > >> > > svn?
> > > >> > >> > > > I
> > > >> > >> > > > > am assuming svn2jira.
> > > >> > >> > > > >
> > > >> > >> > > > >
> > > >> > >> > > > > On Fri, Aug 22, 2014 at 3:48 PM, Karthik Kambatla <
> > > >> > >> > ka...@cloudera.com
> > > >> >
> > > >> > >> > > > > wrote:
> > > >> > >> > > > >
> > > >> > >> > > > >> Hi folks,
> > > >> > >> > > > >>
> > > >> > >> > > > >> For the SCM migration, feel free to follow
> > > >> > >> > > > >> https://issues.apache.org/jira/browse/INFRA-8195
> > > >> > >> > > > >>
> > > >> > >> > > > >> Most of this is planned to be handled this Sunday. As
> a
> > > >> result,
> > > >> > >> the
> > > >> > >> > > > >> subversion repository would be read-only. If this is a
> > > major
> > > >> > >> issue
> > > >> > >> > for
> > > >> > >> > > > you,
> > > >> > >> > > > >> please shout out.
> > > >> > >> > > > >>
> > > >> > >> > > > >> Daniel Gruno, the one helping us with the migration,
> was
> > > >> asking
> > > >> > >> if
> > > >> > >> > we
> > > >> > >> > > > are
> > > >> > >> > > > >> open to renaming "trunk" to "master" to better conform
> > to
> > > >> git
> > > >> > >> > lingo. I
> > > >> > >> > > > am
> > > >> > >> > > > >> tempted to say yes, but wanted to check.
> > > >> > >> > > > >>
> > > >> > >> > > > >> Would greatly appreciate any help with checking the
> git
> > > repo
> > > >> > has
> > > >> > >> > > > >> everything.
> > > >> > >> > > > >>
> > > >> > >> > > > >> Thanks
> > > >> > >> > > > >> Karthik
> > > >> > >> > > > >>
> > > >> > >> > > > >
> > > >> > >> > > > >
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> > > --
> > > >> > >> > > CONFIDENTIALITY NOTICE
> > > >> > >> > > NOTICE: This message is intended for the use of the
> > individual
> > > or
> > > >> > >> entity
> > > >> > >> > to
> > > >> > >> > > which it is addressed and may contain information that is
> > > >> > >> confidential,
> > > >> > >> > > privileged and exempt from disclosure under applicable law.
> > If
> > > >> the
> > > >> > >> reader
> > > >> > >> > > of this message is not the intended recipient, you are
> hereby
> > > >> > notified
> > > >> > >> > that
> > > >> > >> > > any printing, copying, dissemination, distribution,
> > disclosure
> > > or
> > > >> > >> > > forwarding of this communication is strictly prohibited. If
> > you
> > > >> have
> > > >> > >> > > received this communication in error, please contact the
> > sender
> > > >> > >> > immediately
> > > >> > >> > > and delete it from your system. Thank You.
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> --
> > > >> > >> Zhijie Shen
> > > >> > >> Hortonworks Inc.
> > > >> > >> http://hortonworks.com/
> > > >> > >>
> > > >> > >> --
> > > >> > >> CONFIDENTIALITY NOTICE
> > > >> > >> NOTICE: This message is intended for the use of the individual
> or
> > > >> entity
> > > >> > >> to
> > > >> > >> which it is addressed and may contain information that is
> > > >> confidential,
> > > >> > >> privileged and exempt from disclosure under applicable law. If
> > the
> > > >> > reader
> > > >> > >> of this message is not the intended recipient, you are hereby
> > > >> notified
> > > >> > >> that
> > > >> > >> any printing, copying, dissemination, distribution, disclosure
> or
> > > >> > >> forwarding of this communication is strictly prohibited. If you
> > > have
> > > >> > >> received this communication in error, please contact the sender
> > > >> > >> immediately
> > > >> > >> and delete it from your system. Thank You.
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> http://hortonworks.com/download/
> > > >>
> > > >> --
> > > >> CONFIDENTIALITY NOTICE
> > > >> NOTICE: This message is intended for the use of the individual or
> > entity
> > > >> to
> > > >> which it is addressed and may contain information that is
> > confidential,
> > > >> privileged and exempt from disclosure under applicable law. If the
> > > reader
> > > >> of this message is not the intended recipient, you are hereby
> notified
> > > >> that
> > > >> any printing, copying, dissemination, distribution, disclosure or
> > > >> forwarding of this communication is strictly prohibited. If you have
> > > >> received this communication in error, please contact the sender
> > > >> immediately
> > > >> and delete it from your system. Thank You.
> > > >>
> > > >
> > > >
> > >
> > > --
> > > Mobile
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Resolved] (HDFS-3278) Umbrella Jira for HDFS-HA Phase 2

2014-05-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3278.
---

   Resolution: Fixed
Fix Version/s: 2.1.0-beta
 Assignee: Todd Lipcon  (was: Sanjay Radia)

These subtasks were completed quite a while back.

> Umbrella Jira for HDFS-HA Phase 2
> -
>
> Key: HDFS-3278
> URL: https://issues.apache.org/jira/browse/HDFS-3278
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Sanjay Radia
>    Assignee: Todd Lipcon
> Fix For: 2.1.0-beta
>
>
> HDFS-1623 gives a high level architecture and design for hot automatic 
> failover of the NN. Branch HDFS-1623 was merged into trunk for tactical 
> reasons even though the work for HA was not complete, Branch HDFS-1623 
> contained mechanisms for keeping a standby Hot (ie read from shared journal), 
> dual block reports, fencing of DNs, Zookeeper library for leader election 
> etc. This Umbrella jira covers the remaining work for HA and will link all 
> the jiras for the remaining work. Unlike HDFS-1623 no single branch will be 
> created - work will proceed in parallel branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: confirm expected HA behavior

2014-02-07 Thread Todd Lipcon
Hi Arpit,

The issue here is that our transaction log is not a proper "write-ahead
log". In fact, it is a "write-behind" log of sorts -- our general
operations look something like:

- lock namespace
- make a change to namespace
- write to log
- unlock namespace
- sync log

In the case of an active which has been superseded by another one, it only
finds out there is a problem on the "sync" step above. But, it has already
applied the edits to its own namespace. Given that we have no facility to
rollback the change at this point, our only option is to abort, or else
risk having an inconsistent namespace upon a later failover back to this
node.

Another option might be to completely clear and reload the namespace --
essentially performing a within-process restart of the namenode. Given that
most people probably have some kind of cluster management software taking
care of automatically restarting crashed daemons, we figured it was simpler
to do a clean abort+reboot rather than implement the same thing within the
namenode -- thus avoiding any risk that we forget to "clear" any of our
state.

Another option would be to make our logging use a proper "write-ahead"
mechanism instead of the write-behind we do now. Doing this while
maintaining good performance isn't super simple.

There's some more background information on a JIRA filed a few years back
here: https://issues.apache.org/jira/browse/HDFS-1137

Hope that helps,

-Todd


On Wed, Feb 5, 2014 at 2:28 PM, Arpit Gupta  wrote:

> Hi
>
> I have a scenario where i am trying to test how HDFS HA works in case of
> network issues. I used iptables to block requests to the rpc port 8020 in
> order to simulate that. Below is the some info on what i did.
>
>
> NN1 - Active
> NN2 - Standby
>
> Using iptables stop port 8020 on NN1 (
> http://stackoverflow.com/questions/7423309/iptables-block-access-to-port-8000-except-from-ip-address
> )
> iptables -A INPUT -p tcp --dport 8020 -j DROP
>
> NN2 transitions to active.
>
> Run the following command to allow requests to port 8020 (
> http://stackoverflow.com/questions/10197405/iptables-remove-specific-rules
> )
> iptables -D INPUT -p tcp --dport 8020 -j DROP
>
> After this NN1 shut itself down with
>
> 2014-02-05 01:00:38,030 FATAL namenode.FSEditLog
> (JournalSet.java:mapJournalsAndReportErrors(354)) - Error: flush failed for
> required journal (JournalAndStream(mgr=QJM to [IP:8485],
> stream=QuorumOutputStream starting at txid 568))
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 1/1. 1 exceptions thrown:
> 68.142.244.23:8485: IPC's epoch 1 is less than the last promised epoch 2
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:410)
>
>
> NN1 in this case shuts down with the above exception as it still believes
> its active hence there is an exception when talking to JN's. Thus the
> operators would have restart NN1 which could take a while based on the
> image size. Hence i was wondering if there is a better way to handle the
> above case where we may be transition to standby if exceptions like above
> are seen.
>
>
> Wanted to get thoughts of others before i opened a an enhancement request.
>
> Thanks
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [VOTE] Merge HDFS-5698 Use protobuf to serialize / deserialize FSImage to trunk

2014-01-30 Thread Todd Lipcon
One question - what's the intention with this work with regard to branch-2?
Do you plan to merge it to branch-2, and if so, are you thinking for 2.4 or
later?

I'm a little afraid of the divergence between trunk and branch-2 in this
area, since we'll likely want to add new features which require
serialization changes. With the divergent code, we'll need to add them in
trunk in PB form, and in branch-2 in "Writable" form. Sounds like a big
pain for any future backports unless we merge.

-Todd


On Thu, Jan 30, 2014 at 2:37 PM, Haohui Mai  wrote:

> Hello all,
>
> I would like to call a vote to merge of the new protobuf-based FSImage into
> trunk.
>
> The changes introduce a protobuf-based FSImage format into HDFS. It
> separates the responsibility of serializing and reconstructing the
> namespace in the NameNode using Google's Protocol Buffers
> (protobuf). The new FSImage delegates much parts of the serialization /
> deserialization logics to the protobuf side. It is also designed to
> improve the supports for compatibility and interoperability.
>
> The development has been done in a separate branch:
> https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-5698
>
> The design document is available at:
>
> https://issues.apache.org/jira/secure/attachment/12626179/HDFS-5698-design.pdf
>
> The HDFS-5698 jira tracks the development of the feature.
>
> The feature changes no public APIs. The existing tests include both
> unit tests and end-to-end tests. They have already covered this
> feature.
>
> Once the feature is merged into trunk, we will continue to test and
> fix any bugs that may be found on trunk.
>
> The bulk of the design and implementation was done by Jing Zhao,
> Suresh Srinivas and me. Thanks Kihwal Lee, Todd Lipcon, Colin
> Patrick McCabe, Chris Nauroth, and Daryn Sharp for providing feedback on
> the jiras and in discussions.
>
> This vote runs for a week and closes on 2/6/2013 at 11:59 pm PT.
>
> Regards,
> Haohui
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery

2014-01-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5790:
-

 Summary: LeaseManager.findPath is very slow when many leases need 
recovery
 Key: HDFS-5790
 URL: https://issues.apache.org/jira/browse/HDFS-5790
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, performance
Affects Versions: 2.4.0
Reporter: Todd Lipcon


We recently saw an issue where the NN restarted while tens of thousands of 
files were open. The NN then ended up spending multiple seconds for each 
commitBlockSynchronization() call, spending most of its time inside 
LeaseManager.findPath(). findPath currently works by looping over all files 
held for a given writer, and traversing the filesystem for each one. This takes 
way too long when tens of thousands of files are open by a single writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Deprecate BackupNode

2013-12-04 Thread Todd Lipcon
+1


On Wed, Dec 4, 2013 at 3:06 PM, Suresh Srinivas wrote:

> It is almost an year a jira proposed deprecating backup node -
> https://issues.apache.org/jira/browse/HDFS-4114.
>
> Maintaining it adds unnecessary work. As an example, when I added support
> for retry cache there were bunch of code paths related to backup node that
> added unnecessary work. I do not know of anyone who is using this.
>
> If there are no objections, I want to deprecate that code in 2.3 release
> and remove it from trunk. I will start this work next week.
>
> Regards,
>
> Suresh
>
>
> --
> http://hortonworks.com/download/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Next releases

2013-11-12 Thread Todd Lipcon
On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe wrote:

> To be honest, I'm not aware of anything in 2.2.1 that shouldn't be
> there.  However, I have only been following the HDFS and common side
> of things so I may not have the full picture.  Arun, can you give a
> specific example of something you'd like to "blow away"?
>

I agree with Colin. If we've been backporting things into a patch release
(third version component) which don't belong, we should explicitly call out
those patches, so we can learn from our mistakes and have a discussion
about what belongs. Otherwise we'll just end up doing it again. Saying
"there were a few mistakes, so let's reset back a bunch of backport work"
seems like a baby-with-the-bathwater situation.

Todd


Re: Question regarding access to different hadoop 2.0 cluster

2013-11-06 Thread Todd Lipcon
t
> >> >> > hadoop-cluster2-logicalname's namenodes.
> >> >> >
> >> >> >
> >> >> > One option is to add hadoop-cluster2-logicalname's namednodes to
> >> >> > /path/to/hadoop-cluster1/hdfs-site.xml. But with many clusters,
> >>this
> >> >> > becomes problem.
> >> >> > Is there any other cleaner approach to solving this?
> >> >> >
> >> >> > --
> >> >> > Have a Nice Day!
> >> >> > Lohit
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> http://hortonworks.com/download/
> >> >>
> >> >> --
> >> >> CONFIDENTIALITY NOTICE
> >> >> NOTICE: This message is intended for the use of the individual or
> >> >>entity to
> >> >> which it is addressed and may contain information that is
> >>confidential,
> >> >> privileged and exempt from disclosure under applicable law. If the
> >> >>reader
> >> >> of this message is not the intended recipient, you are hereby
> >>notified
> >> >>that
> >> >> any printing, copying, dissemination, distribution, disclosure or
> >> >> forwarding of this communication is strictly prohibited. If you have
> >> >> received this communication in error, please contact the sender
> >> >>immediately
> >> >> and delete it from your system. Thank You.
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >Have a Nice Day!
> >> >Lohit
> >>
> >>
> >
> >
> >--
> >http://hortonworks.com/download/
> >
> >--
> >CONFIDENTIALITY NOTICE
> >NOTICE: This message is intended for the use of the individual or entity
> >to
> >which it is addressed and may contain information that is confidential,
> >privileged and exempt from disclosure under applicable law. If the reader
> >of this message is not the intended recipient, you are hereby notified
> >that
> >any printing, copying, dissemination, distribution, disclosure or
> >forwarding of this communication is strictly prohibited. If you have
> >received this communication in error, please contact the sender
> >immediately
> >and delete it from your system. Thank You.
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


BKJM - anyone using it?

2013-11-05 Thread Todd Lipcon
Hey folks,

Is anyone using BKJM? The BookKeeper-based HA approach was developed around
the same time as QuorumJournalManager, but I haven't seen much reported
about it in the last year, and I'm not aware of anyone having deployed BKJM
in any production use cases.

If no one is using it, I would propose removing it from the HDFS tree - if
anyone would like to experiment with it in the future, it could be
built/developed on github as an external project. If a significant
community emerges in the future, we could consider moving it back in.

(this isn't a formal vote thread, just trying to find out if it is indeed
abandoned)

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Datanode fencing mechanism

2013-10-28 Thread Todd Lipcon
Hi Liu Le,

You're correct, that's an oversight that was designed but never
implemented. It's quite a rare circumstance but we should probably
implement the persistent promise as you suggested. Want to have a try at
making a patch for trunk?

-Todd


On Mon, Oct 28, 2013 at 1:57 AM, lei liu  wrote:

> In https://issues.apache.org/jira/browse/HDFS-1972 jira, there is one
> below
> case:
> Scenario 3: DN restarts during split brain period
>
> (this scenario illustrates why I think we need to persistently record the
> promise about who is active)
>
>- block has 2 replicas, user asks to reduce to 1
>- NN1 adds the block to DN1's invalidation queue, but it's backed up
>behind a bunch of other commands, so doesn't get issued yet.
>- Failover occurs, but NN1 still thinks it's active.
>- DN1 promises to NN2 not to accept commands from NN1. It sends an empty
>deletion report to NN2. Then, it crashes.
>- NN2 has received a deletion report from everyone, and asks DN2 to
>delete the block. It hasn't realized that DN1 is crashed yet.
>- DN2 deletes the block.
>
>
>- DN1 starts back up. When it comes back up, it talks to NN1 first
>(maybe it takes a while to connect to NN2 for some reason)
>   - ** Now, if we had saved the "promise" as part of persistent state,
>   we could ignore NN1 and avoid this issue. Otherwise:
>   - NN1 still thinks it's active, and sends a command to DN1 to delete
>   the block. DN1 does so.
>   - We lost the bloc
>
>
> I am use the CDH4.3.1 version, and am reading the DataNode code. I don't
> find the DataNode to save the "promise" as part of persistent state.   I
> want to know whether the case 3 is handled in CDH4.3.1 version.  If  the
> case is hadnled, where is the code?
>
>
> Thanks,
>
> LiuLe
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-5287) JN need not validate finalized log segments in newEpoch

2013-10-01 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5287:
-

 Summary: JN need not validate finalized log segments in newEpoch
 Key: HDFS-5287
 URL: https://issues.apache.org/jira/browse/HDFS-5287
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: qjm
Affects Versions: 2.1.1-beta
Reporter: Todd Lipcon
Priority: Minor


In {{scanStorageForLatestEdits}}, the JN will call {{validateLog}} on the last 
log segment, regardless of whether it is finalized. If it's finalized, then 
this is a needless pass over the logs which can adversely affect failover time 
for a graceful failover.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Long time fail over when using QJM

2013-08-29 Thread Todd Lipcon
If you're seeing those log messages, the SBN was already active at that
time. It only logs that message when successfully writing transactions. So,
the failover must have already completed before the logs you're looking at.

-Todd

On Thu, Aug 29, 2013 at 1:18 AM, Mickey  wrote:

> Hi, all
> I tried to test the QJM HA and it always works good. But, yestoday I met
> an quite long time fail over with QJM. The test is base on the CDH4.3.0.
> The attachment is the standby namenode and the journalnode 's logs.
> The network cable on active namenode(also a datanode) was pulled out at
> about 07:24. From the standby-namenode log I found log like this:
> 2013-08-28 07:24:51,122 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1
> Total time for transactions(ms): 1Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0 41 42
> 2013-08-28 07:36:14,028 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions:
> 32 Total time for transactions(ms): 3Number of transactions batched in
> Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46
>
> The information seems regular. The problem is that between the 2 lines
> there's no log  in 12 minutes. There is no long gc happened. It seems the
> code blocked somewhere. Unfortunately, I forgot to print the jstack info
> T_T.
>
> Hope for your response.
>
> Best regards,
> Mickey
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Secure deletion of blocks

2013-08-15 Thread Todd Lipcon
Hi Matt,

I'd also recommend implementing this in a somewhat pluggable way -- eg a
configuration for a Deleter class. The default Deleter can be the one we
use today which just removes the file, and you could plug in a
SecureDeleter. I'd also see some use cases for a Deleter implementation
which doesn't actually delete the block, but instead moves it to a local
trash directory which is deleted a day or two later. This sort of policy
can help recover data as a last ditch effort if there is some kind of
accidental deletion and there aren't snapshots in place.

-Todd

On Thu, Aug 15, 2013 at 11:50 AM, Andrew Wang wrote:

> Hi Matt,
>
> Here are some code pointers:
>
> - When doing a file deletion, the NameNode turns the file into a set of
> blocks that need to be deleted.
> - When datanodes heartbeat in to the NN (see BPServiceActor#offerService),
> the NN replies with blocks to be invalidated (see BlockCommand and
> DatanodeProtocol.DNA_INVALIDATE).
> - The DN processes these invalidates in
> BPServiceActor#processCommandFromActive (look for DNA_INVALIDATE again).
> - The magic lines you're looking for are probably in
> FsDatasetAsyncDiskService#run, since we delete blocks in the background
>
> Best,
> Andrew
>
>
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> matt.fell...@bespokesoftware.com> wrote:
>
> > Hi,
> > I'm looking into writing a patch for HDFS which will provide a new method
> > within HDFS which can securely delete the contents of a block on all the
> > nodes upon which it exists. By securely delete I mean, overwrite with
> > 1's/0's/random data cyclically such that the data could not be recovered
> > forensically.
> >
> > I'm not currently aware of any existing code / methods which provide
> this,
> > so was going to implement this myself.
> >
> > I figured the DataNode.java was probably the place to start looking into
> > how this could be done, so I've read the source for this, but it's not
> > really enlightened me a massive amount.
> >
> > I'm assuming I need to tell the NameServer that all DataNodes with a
> > particular block id would be required to be deleted, then as each
> DataNode
> > calls home, the DataNode would be instructed to securely delete the
> > relevant block, and it would oblige.
> >
> > Unfortunately I have no idea where to begin and was looking for some
> > pointers?
> >
> > I guess specifically I'd like to know:
> >
> > 1. Where the hdfs CLI commands are implemented
> > 2. How a DataNode identifies a block / how a NameServer could inform a
> > DataNode to delete a block
> > 3. Where the existing "delete" is implemented so I can make sure my
> secure
> > delete makes use of it after successfully blanking the block contents
> > 4. If I've got the right idea about this at all?
> >
> > Kind regards,
> > Matt Fellows
> >
> > --
> > [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
> >  First Option Software Ltd
> > Signal House
> > Jacklyns Lane
> > Alresford
> > SO24 9JJ
> > Tel: +44 (0)1962 738232
> > Mob: +44 (0)7710 160458
> > Fax: +44 (0)1962 600112
> > Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<
> http://bespokesoftware.com/>
> >
> > __**__
> >
> > This is confidential, non-binding and not company endorsed - see full
> > terms at www.fosolutions.co.uk/**emailpolicy.html<
> http://www.fosolutions.co.uk/emailpolicy.html>
> >
> > First Option Software Ltd Registered No. 06340261
> > Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> > __**__
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Resolved] (HDFS-3656) ZKFC may write a null "breadcrumb" znode

2013-08-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3656.
---

  Resolution: Duplicate
Target Version/s:   (was: )

Yep, I think you're right. Thanks.

> ZKFC may write a null "breadcrumb" znode
> 
>
> Key: HDFS-3656
> URL: https://issues.apache.org/jira/browse/HDFS-3656
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>
> A user [reported|https://issues.cloudera.org/browse/DISTRO-412] an NPE trying 
> to read the "breadcrumb" znode in the failover controller. This happened 
> repeatedly, implying that an earlier process set the znode to null - probably 
> some race, though I don't see anything obvious in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-07 Thread Todd Lipcon
Hi Jeff,

Do you need to subclass or could you simply wrap? Generally composition as
opposed to inheritance is a lot safer way of integrating software written
by different parties, since inheritance exposes all the implementation
details which are subject to change.

-Todd

On Wed, Aug 7, 2013 at 10:59 AM, Jeff Dost  wrote:

> Hello,
>
> We work in a software development team at the UCSD CMS Tier2 Center.  We
> would like to propose a mechanism to allow one to subclass the
> DFSInputStream in a clean way from an external package.  First I'd like to
> give some motivation on why and then will proceed with the details.
>
> We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment at
> CERN.  There are other T2 centers worldwide that contain mirrors of the
> same data we host.  We are working on an extension to Hadoop that, on
> reading a file, if it is found that there are no available replicas of a
> block, we use an external interface to retrieve this block of the file from
> another data center.  The external interface is necessary because not all
> T2 centers involved in CMS are running a Hadoop cluster as their storage
> backend.
>
> In order to implement this functionality, we need to subclass the
> DFSInputStream and override the read method, so we can catch IOExceptions
> that occur on client reads at the block level.
>
> The basic steps required:
> 1. Invent a new URI scheme for the customized "FileSystem" in
> core-site.xml:
>   
> fs.foofs.impl
> my.package.**FooFileSystem
> My Extended FileSystem for foofs: uris.
>   
>
> 2. Write new classes included in the external package that subclass the
> following:
> FooFileSystem subclasses DistributedFileSystem
> FooFSClient subclasses DFSClient
> FooFSInputStream subclasses DFSInputStream
>
> Now any client commands that explicitly use the foofs:// scheme in paths
> to access the hadoop cluster can open files with a customized InputStream
> that extends functionality of the default hadoop client DFSInputStream.  In
> order to make this happen for our use case, we had to change some access
> modifiers in the DistributedFileSystem, DFSClient, and DFSInputStream
> classes provided by Hadoop.  In addition, we had to comment out the check
> in the namenode code that only allows for URI schemes of the form "hdfs://".
>
> Attached is a patch file we apply to hadoop.  Note that we derived this
> patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be
> found at:
> http://archive.cloudera.com/**cdh4/cdh/4/hadoop-2.0.0-cdh4.**1.1.tar.gz<http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>
> We would greatly appreciate any advise on whether or not this approach
> sounds reasonable, and if you would consider accepting these modifications
> into the official Hadoop code base.
>
> Thank you,
> Jeff, Alja & Matevz
> UCSD Physics
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment

2013-08-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5074:
-

 Summary: Allow starting up from an fsimage checkpoint in the 
middle of a segment
 Key: HDFS-5074
 URL: https://issues.apache.org/jira/browse/HDFS-5074
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon


We've seen the following behavior a couple times:
- SBN is running and somehow encounters an error in the middle of replaying an 
edit log in the tailer (eg the JN it's reading from crashes)
- SBN successfully has processed half of the edits in the segment it was 
reading.
- SBN saves a checkpoint, which now falls in the middle of a segment, and then 
restarts

Upon restart, the SBN will load this checkpoint which falls in the middle of a 
segment. {{selectInputStreams}} then fails when the SBN requests a mid-segment 
txid.

We should handle this case by downloading the right segment and fast-forwarding 
to the correct txid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5058) QJM should validate startLogSegment() more strictly

2013-08-02 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5058:
-

 Summary: QJM should validate startLogSegment() more strictly
 Key: HDFS-5058
 URL: https://issues.apache.org/jira/browse/HDFS-5058
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: qjm
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We've seen a small handful of times a case where one of the NNs in an HA 
cluster ends up with an fsimage checkpoint that falls in the middle of an edit 
segment. We're not sure yet how this happens, but one issue can happen as a 
result:
- Node has fsimage_500. Cluster has edits_1-1000, edits_1001_inprogress
- Node restarts, loads fsimage_500
- Node wants to become active. It calls selectInputStreams(500). Currently, 
this API logs a WARN that 500 falls in the middle of the 1-1000 segment, but 
continues and returns no results.
- Node calls startLogSegment(501).

Currently, the QJM will accept this (incorrectly). The node then crashes when 
it first tries to journal a real transaction, but it ends up leaving the 
edits_501_inprogress lying around, potentially causing more issues later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5037) Active NN should trigger its own edit log rolls

2013-07-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5037:
-

 Summary: Active NN should trigger its own edit log rolls
 Key: HDFS-5037
 URL: https://issues.apache.org/jira/browse/HDFS-5037
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon


We've seen cases where the SBN/2NN went down, and then users accumulated very 
very large edit log segments. This causes a slow startup time because the last 
edit log segment must be read fully to recover it before the NN can start up 
again. Additionally, in the case of QJM, it can trigger timeouts on recovery or 
edit log syncing because the very-large segment has to get processed within a 
certain time bound.

We could easily improve this by having the NN trigger its own edit log rolls on 
a configurable size (eg every 256MB)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: I'm interested in working with HDFS-4680. Can somebody be a mentor?

2013-07-17 Thread Todd Lipcon
I'm happy to help with this as well. I actually have a prototype patch that
I built during a hackathon a few months ago, and was able to get a full
stack trace including Client, NN, and DN. I'm on vacation this week but
will try to post my prototype upstream when I get back. Feel free to ping
me on this if I slack :)

-Todd

On Wed, Jul 17, 2013 at 4:12 PM, Stack  wrote:

> Folks over at HBase would be interested in helping out.
>
> What does a mentor have to do?  I poked around the icfoss link but didn't
> see list of duties (I've been know to be certified blind on occasion).
>
> I am not up on the malleability of hdfs RPC; is it just a matter of adding
> the trace info to a pb header record or would it require more (Sanjay was
> saying something recently off-list that trace id is imminent -- but I've
> not done the digging)?
>
> St.Ack
>
>
> On Wed, Jul 17, 2013 at 1:44 PM, Sreejith Ramakrishnan <
> sreejith.c...@gmail.com> wrote:
>
> > Hey,
> >
> > I was originally researching options to work on ACCUMULO-1197. Basically,
> > it was a bid to pass trace functionality through the DFSClient. I
> discussed
> > with the guys over there on implementing a Google Dapper-style trace with
> > HTrace. The guys at HBase are also trying to achieve the same HTrace
> > integration [HBASE-6449]
> >
> > But, that meant adding stuff to the RPC in HDFS. For a start, we've to
> add
> > a 64-bit span-id to every RPC with tracing enabled. There's some more in
> > the original Dapper paper and HTrace documentation.
> >
> > I was told by the Accumulo people to talk with and seek help from the
> > experts at HDFS. I'm open to suggestions.
> >
> > Additionally, I'm participating in a Joint Mentoring Programme by Apache
> > which is quite similar to GSoC. Luciano Resende (Community Development,
> > Apache) is incharge of the programme. I'll attach a link. The last date
> is
> > 19th July. So, I'm pretty tensed without any mentors :(
> >
> > [1] https://issues.apache.org/jira/browse/ACCUMULO-1197
> > [2] https://issues.apache.org/jira/browse/HDFS-4680
> > [3] https://github.com/cloudera/htrace
> > [4] http://community.apache.org/mentoringprogramme-icfoss-pilot.html
> > [5] https://issues.apache.org/jira/browse/HBASE-6449
> >
> > Thank you,
> > Sreejith R
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-4982) JournalNode should relogin from keytab before fetching logs from other JNs

2013-07-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4982:
-

 Summary: JournalNode should relogin from keytab before fetching 
logs from other JNs
 Key: HDFS-4982
 URL: https://issues.apache.org/jira/browse/HDFS-4982
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node, security
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We've seen an issue in a secure cluster where, after a failover, the new NN 
isn't able to properly coordinate QJM recovery. The JNs fail to fetch logs from 
each other due to apparently not having a Kerberos TGT. It seems that we need 
to add the {{checkTGTAndReloginFromKeytab}} call prior to making the HTTP 
connection, since the java HTTP stuff doesn't do an automatic relogin

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3182) Add class to manage JournalList

2013-07-08 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3182.
---

Resolution: Won't Fix

This was obsoleted by other work in HDFS-3077

> Add class to manage JournalList
> ---
>
> Key: HDFS-3182
> URL: https://issues.apache.org/jira/browse/HDFS-3182
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Suresh Srinivas
>
> See the comment for details of the JournalList ZooKeeper znode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A question for txid

2013-06-25 Thread Todd Lipcon
I did some back of the envelope math when implementing txids, and
determined that overflow is not ever going to happen... A "busy" namenode
does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can
run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can
run your namenode for 2^(63-10-25) = 268 million years.

Hadoop is great software and I'm sure it will be around for years to come,
but if it's still running in 268 million years, that will be a pretty
depressing rate of technological progress!

-Todd

On Tue, Jun 25, 2013 at 6:14 AM, Harsh J  wrote:

> Yes, it logically can if there have been as many transactions (its a
> very very large number to reach though).
>
> Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.
>
> I hacked up my local NN's txids manually to go very large (close to
> max) and decided to try out if this causes any harm. I basically
> bumped up the freshly formatted starting txid to 9223372036854775805
> (and ensured image references the same):
>
> ➜  current  ls
> VERSION
> fsimage_9223372036854775805.md5
> fsimage_9223372036854775805
> seen_txid
> ➜  current  cat seen_txid
> 9223372036854775805
>
> NameNode started up as expected.
>
> 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
> in 0 seconds.
> 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
> 9223372036854775805 from
> /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
> 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
> 9223372036854775806
>
> I could create a bunch of files and do regular ops (counting to much
> after the long max increments). I created over 100 files, just to make
> it go well over the Long.MAX_VALUE.
>
> Quitting NameNode and restarting fails though, with the following error:
>
> 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
> java.io.IOException: Gap in transactions. Expected to be able to read
> up until at least txid 9223372036854775806 but unable to find any edit
> logs containing txid -9223372036854775808
>
> So it looks like it cannot currently handle an overflow.
>
> I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
> this. I don't think this is of immediate concern though, so we should
> be able to address it in future (unless there's parts of the code
> which already are preventing reaching this number in the first place -
> please do correct me if there is such a part).
>
> On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu  wrote:
> > Hi dear All,
> >
> > It's long type for the txid currently,
> >
> > FSImage.java:
> >
> > boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
> > throws IOException{
> >
> >   editLog.setNextTxId(lastAppliedTxId + 1L);
> > }
> >
> > Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?
>
>
>
> --
> Harsh J
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: DesignLounge @ HadoopSummit

2013-06-24 Thread Todd Lipcon
Cool, thanks Eric. I don't see any particular need to have a session on QJM
-- it seems to be working fine in our customers' production clusters, so I
don't have any near term road map in mind to discuss. But if anyone else
wants to sync up, happy to do so - just let me know.

-Todd

On Mon, Jun 24, 2013 at 8:44 PM, Eric Baldeschwieler  wrote:

> Hi Todd,
>
> The room is planned to have 4 - 5 discuss stations and should accommodate
> lots of folks.
>
> So if a group decides to meet on QJM at another time, there should be
> space.
>
> Let me know what your experience with this format is.
>
> E14
>
> On Jun 24, 2013, at 2:50 PM, Todd Lipcon  wrote:
>
> > Hey all,
> >
> > Unfortunately I will have to leave early on Thursday, so will miss the
> slotted time at the design lounge.
> >
> > But, I'll be at the summit all day Wed and on Thurs morning, so if
> anyone wants to catch up regarding QJM, HA, or performance, feel free to
> hunt me down and grab me. I hesitate to put my cell # on the mailing list,
> but many of you have it, so feel free to send a text, or tweet at me if you
> want to meet up.
> >
> > -Todd
> >
> > On Mon, Jun 24, 2013 at 2:48 PM, Suresh Srinivas 
> wrote:
> > FYI HDFS committers and developers... This will be participant-driven
> > unconference style session.
> >
> > So no formal presentations are planned. Some of the things that we could
> > discuss are:
> > - New features recently added to HDFS such as HA, Snapshots, NFS etc.
> > - Other features that we as a community could focus on - such as
> > heterogeneous storage/support SSDs, improvements for more realtime
> > applications in HDFS.
> >
> >  If you have other topic in mind, please do bring them up.
> >
> >
> >
> >
> >
> >
> >
> > -- Forwarded message --
> > From: Eric Baldeschwieler 
> > Date: Sun, Jun 23, 2013 at 9:32 PM
> > Subject: DesignLounge @ HadoopSummit
> > To: "common-...@hadoop.apache.org" , "
> > mapreduce-...@hadoop.apache.org" , "
> > hdfs-dev@hadoop.apache.org" 
> >
> >
> > Hi Folks,
> >
> > I've integrated the feedback I've gotten on the design lounge.  A couple
> of
> > clarifications:
> >
> > 1) The space will be open both days of the formal summit.  Apache
> > Committers / contributors are invited to stop by any time and use the
> space
> > to meet / network any time during the show.
> >
> > 2) Below I've listed the times that various project members have
> suggested
> > they will be present to talk with others contributors about their
> project.
> >  If we get a big showing for any of these slots we'll encourage folks to
> do
> > the unconference thing: Select a set of topics they want to talk about
> and
> > break up into groups to do so.
> >
> > 3) This is an experiment.  Our goal is to make the summit as useful as
> > possible to the folks who build the Apache projects in the Apache Hadoop
> > stack.  Please let me know how it works for you and ideas for making this
> > even more effective.
> >
> > Committed times so far, with topic champion (Note - I've adjusted
> suggested
> > times to fit with the program a bit more smoothly):
> >
> > Wednesday
> > 11-1 - Hive - Ashutosh - The stinger initiative and other Hive activities
> > 2 - 4 - Security breakout - Kevin Minder - HSSO, Knox, Rhino
> > 3 - 4 - Frameworks to run services like HBase on Yarn - Weave, Hoya … -
> > Devaraj Das
> > 4 - 5 - Accumulo - Billie Rinaldi
> >
> >
> > Thursday
> > 11-1 - Finishing Yarn - Arun Murthy - Near term improvements needed
> > 2 - 4 - HDFS - Suresh & Sanjay
> > 4 - 5 - Getting involved in Apache - Billie Rinaldi
> >
> >
> > See you all soon!
> >
> > E14
> >
> > PS Please forward to other Apache -dev lists and CC me.  Thanks!
> >
> > On Jun 11, 2013, at 10:42 AM, Eric Baldeschwieler <
> eri...@hortonworks.com>
> > wrote:
> >
> > > Hi Folks,
> > >
> > > We thought we'd try something new at Hadoop Summit this year to build
> > upon two pieces of feedback I've heard a lot this year:
> > >
> > >   • Apache project developers would like to take advantage of the
> > Hadoop summit to meet with their peers to on work on specific technical
> > details of their projects
> > >   • That they want to do this during the summit, not before it
> starts
> > or at night. I'

Re: DesignLounge @ HadoopSummit

2013-06-24 Thread Todd Lipcon
of detailed
> proposals circulating.  Let's get together and hash out the differences.
> >   • Accumulo 1.6 features
> >   • The Hive vectorization project.  Discussion of the design and how
> to phase it in incrementally with minimum complexity.
> >   • Finishing Yarn - what things need to get done NOW to make Yarn
> more effective
> > If you are a project lead for one of the Apache projects, look at the
> schedule below and suggest a few slots when you think it would be best for
> your project to meet.  I'll try to work out a schedule where no more than 2
> projects are using the lounge at once.
> >
> > Day 1, 26th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm
> >
> > Day 2, 27th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm
> >
> > It will be up to you, the hadoop contributors, from there.
> >
> > Look forward to seeing you all at the summit,
> >
> > E14
> >
> > PS Please forward to the other -dev lists.  This event is for folks on
> the -dev lists.
> >
>
>
>
>
> --
> http://hortonworks.com/download/
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads up: branch-2.1-beta

2013-06-19 Thread Todd Lipcon
Have you tried running with -XX:+HeapDumpOnOutOfMemoryError? Got a heap
dump folks can look at?

The new snapshot stuff may be at fault... other than that, no particular
ideas.

On Wed, Jun 19, 2013 at 10:02 PM, Roman Shaposhnik  wrote:

> On Wed, Jun 19, 2013 at 5:21 PM, Roman Shaposhnik  wrote:
> > On Wed, Jun 19, 2013 at 4:29 PM, Roman Shaposhnik 
> wrote:
> >> On Tue, Jun 18, 2013 at 11:58 PM, Arun C Murthy 
> wrote:
> >>> Ping. Any luck?
> >
> > One final question -- has the memory requirements for NN change? I've
> > been always running my Bigtop tests with 1G of heap and I've never
> > seen any issues. In this test run NN kept OOMing till I doubled its
> > heap size. Is this expected?
>
> To follow: NN keeps OOMing. It seems that a reliable way to make it happen
> is to run Bigtop's version of TestCLI against a fully distributed
> cluster. At this
> point I'm actually reasonably concerned and will follow up tomorrow.
>
> Thanks,
> Roman.
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-4915) Add config to ZKFC to disable fencing

2013-06-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4915:
-

 Summary: Add config to ZKFC to disable fencing
 Key: HDFS-4915
 URL: https://issues.apache.org/jira/browse/HDFS-4915
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 3.0.0
Reporter: Todd Lipcon


With QuorumJournalManager, it's not important for the ZKFCs to perform any 
fencing. We currently workaround this by setting the fencer to /bin/true, but 
the ZKFC still does things like create "breadcrumb znodes", etc. It would be 
simpler to add a config to disable fencing, and then the ZKFC's job would be 
simpler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: dfs.datanode.socket.reuse.keepalive

2013-06-10 Thread Todd Lipcon
+1 for dropping the client side expiry down to something like 1-2 seconds.
I'd rather do that than up the server side, since the server side resource
(DN threads) is likely to be more contended.

-Todd

On Fri, Jun 7, 2013 at 4:29 PM, Colin McCabe  wrote:

> Hi all,
>
> HDFS-941 added dfs.datanode.socket.reuse.keepalive.  This allows
> DataXceiver worker threads in the DataNode to linger for a second or
> two after finishing a request, in case the client wants to send
> another request.  On the client side, HDFS-941 added a SocketCache, so
> that subsequent client requests could reuse the same socket.  Sockets
> were closed purely by an LRU eviction policy.
>
> Later, HDFS-3373 added a minimum expiration time to the SocketCache,
> and added a thread which periodically closed old sockets.
>
> However, the default timeout for SocketCache (which is now called
> PeerCache) is much longer than the DN would possibly keep the socket
> open.  Specifically, dfs.client.socketcache.expiryMsec defaults to 2 *
> 60 * 1000 (2 minutes), whereas dfs.datanode.socket.reuse.keepalive
> defaults to 1000 (1 second).
>
> I'm not sure why we have such a big disparity here.  It seems like
> this will inevitably lead to clients trying to use sockets which have
> gone stale, because the server closes them way before the client
> expires them.  Unless I'm missing something, we should probably either
> lengthen the keepalive, or shorten the socket cache expiry, or both.
>
> thoughts?
> Colin
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-4879) Add "blocked ArrayList" collection to avoid CMS full GCs

2013-06-04 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4879:
-

 Summary: Add "blocked ArrayList" collection to avoid CMS full GCs
 Key: HDFS-4879
 URL: https://issues.apache.org/jira/browse/HDFS-4879
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.4-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We recently saw an issue where a large deletion was issued which caused 25M 
blocks to be collected during {{deleteInternal}}. Currently, the list of 
collected blocks is an ArrayList, meaning that we had to allocate a contiguous 
25M-entry array (~400MB). After a NN has been running for a long amount of 
time, the old generation may become fragmented such that it's hard to find a 
400MB contiguous chunk of heap.

In general, we should try to design the NN such that the only large objects are 
long-lived and created at startup time. We can improve this particular case 
(and perhaps some others) by introducing a new List implementation which is 
made of a linked list of arrays, each of which is size-limited (eg to 1MB).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4184) Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate

2013-05-28 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4184.
---

Resolution: Duplicate

> Add the ability for Client to provide more hint information for DataNode to 
> manage the OS buffer cache more accurate
> 
>
> Key: HDFS-4184
> URL: https://issues.apache.org/jira/browse/HDFS-4184
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: binlijin
>
> HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to 
> manage the OS buffer cache.
> {code}
> When hbase read hlog the data we can set dfs.datanode.drop.cache.behind.reads 
> to true to drop data out of the buffer cache when performing sequential reads.
> When hbase write hlog we can set dfs.datanode.drop.cache.behind.writes to 
> true to drop data out of the buffer cache after writing
> When hbase read hfile during compaction we can set 
> dfs.datanode.readahead.bytes to a non-zero value to trigger readahead for 
> sequential reads, and also set dfs.datanode.drop.cache.behind.reads to true 
> to drop data out of the buffer cache when performing sequential reads.
> and so on... 
> {code}
> Current we can only set these feature global in datanode,we should set these 
> feature per session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4833) Corrupt blocks are not invalidated when first processing repl queues

2013-05-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4833:
-

 Summary: Corrupt blocks are not invalidated when first processing 
repl queues
 Key: HDFS-4833
 URL: https://issues.apache.org/jira/browse/HDFS-4833
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Todd Lipcon


When the NN processes misreplicated blocks in {{processMisReplicatedBlock}} (eg 
during initial startup when first processing repl queues), it does not 
invalidate corrupt replicas unless the block is also over-replicated. This can 
result in replicas stuck in "corrupt" state forever if they were that way when 
the cluster booted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4828) Make QJM epoch-related errors more understandable

2013-05-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4828:
-

 Summary: Make QJM epoch-related errors more understandable
 Key: HDFS-4828
 URL: https://issues.apache.org/jira/browse/HDFS-4828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: qjm
Affects Versions: 3.0.0, 2.0.5-beta
Reporter: Todd Lipcon


Since we started running QJM on production clusters, we've found that users are 
very confused by some of the error messages that it produces. For example, when 
a failover occurs and an old NN gets fenced out, it sees errors about its epoch 
being out of date. We should amend these errors to add text like "This is 
likely because another NameNode took over as Active." Potentially we can even 
include the other NN's hostname, timestamp it became active, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4799) Corrupt replica can be prematurely removed from corruptReplicas map

2013-05-05 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4799:
-

 Summary: Corrupt replica can be prematurely removed from 
corruptReplicas map
 Key: HDFS-4799
 URL: https://issues.apache.org/jira/browse/HDFS-4799
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker


We saw the following sequence of events in a cluster result in losing the most 
recent genstamp of a block:
- client is writing to a pipeline of 3
- the pipeline had nodes fail over some period of time, such that it left 3 
old-genstamp replicas on the original three nodes, having recruited 3 new 
replicas with a later genstamp.
-- so, we have 6 total replicas in the cluster, three with old genstamps on 
downed nodes, and 3 with the latest genstamp
- cluster reboots, and the nodes with old genstamps blockReport first. The 
replicas are correctly added to the corrupt replicas map since they have a 
too-old genstamp
- the nodes with the new genstamp block report. When the latest one block 
reports, chooseExcessReplicates is called and incorrectly decides to remove the 
three good replicas, leaving only the old-genstamp replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: DistributedFileSystem.listStatus() - Why does it do partial listings then assemble?

2013-05-02 Thread Todd Lipcon
Hi Brad,

The reasoning is that the NameNode locking is somewhat coarse grained. In
older versions of Hadoop, before it worked this way, we found that listing
large directories (eg with 100k+ files) could end up holding the namenode's
lock for a quite long period of time and starve other clients.

Additionally, I believe there is a second API that does the "on-demand"
fetching of the next set of files from the listing as well, no?

As for the consistency argument, you're correct that you may have a
non-atomic view of the directory contents, but I can't think of any
applications where this would be problematic.

-Todd

On Thu, May 2, 2013 at 9:18 AM, Brad Childs  wrote:

> Could someone explain why the DistributedFileSystem's listStatus() method
> does a piecemeal assembly of a directory listing within the method?
>
> Is there a locking issue? What if an element is added to the the directory
> during the operation?  What if elements are removed?
>
> It would make sense to me that the FileSystem class listStatus() method
> returned an Iterator allowing only partial fetching/chatter as needed.  But
> I dont understand why you'd want to assemble a giant array of the listing
> chunk by chunk.
>
>
> Here's the source of the listStatus() method, and I've linked the entire
> class below.
>
>
> -
>
>   public FileStatus[] listStatus(Path p) throws IOException {
> String src = getPathName(p);
>
> // fetch the first batch of entries in the directory
> DirectoryListing thisListing = dfs.listPaths(
> src, HdfsFileStatus.EMPTY_NAME);
>
> if (thisListing == null) { // the directory does not exist
>   return null;
> }
>
> HdfsFileStatus[] partialListing = thisListing.getPartialListing();
> if (!thisListing.hasMore()) { // got all entries of the directory
>   FileStatus[] stats = new FileStatus[partialListing.length];
>   for (int i = 0; i < partialListing.length; i++) {
> stats[i] = makeQualified(partialListing[i], p);
>   }
>   statistics.incrementReadOps(1);
>   return stats;
> }
>
> // The directory size is too big that it needs to fetch more
> // estimate the total number of entries in the directory
> int totalNumEntries =
>   partialListing.length + thisListing.getRemainingEntries();
> ArrayList listing =
>   new ArrayList(totalNumEntries);
> // add the first batch of entries to the array list
> for (HdfsFileStatus fileStatus : partialListing) {
>   listing.add(makeQualified(fileStatus, p));
> }
> statistics.incrementLargeReadOps(1);
>
> // now fetch more entries
> do {
>   thisListing = dfs.listPaths(src, thisListing.getLastName());
>
>   if (thisListing == null) {
> return null; // the directory is deleted
>   }
>
>   partialListing = thisListing.getPartialListing();
>   for (HdfsFileStatus fileStatus : partialListing) {
> listing.add(makeQualified(fileStatus, p));
>   }
>   statistics.incrementLargeReadOps(1);
> } while (thisListing.hasMore());
>
> return listing.toArray(new FileStatus[listing.size()]);
>   }
>
> --------
>
>
>
>
>
> Ref:
>
> https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java
> http://docs.oracle.com/javase/6/docs/api/java/util/Iterator.html
>
>
> thanks!
>
> -bc
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Heads up - Snapshots feature merge into trunk

2013-04-24 Thread Todd Lipcon
On Fri, Apr 19, 2013 at 3:36 AM, Aaron T. Myers  wrote:

> On Fri, Apr 19, 2013 at 6:53 AM, Tsz Wo Sze  wrote:
>
> > HdfsAdmin is also for admin operations.  However, createSnapshot etc
> > methods aren't.
> >
>
> I agree that they're not administrative operations in the sense that they
> don't strictly require super user privilege, but they are "administrative"
> in the sense that they will most-often be used by those administering HDFS.
> The HdfsAdmin class should not be construed to contain only operations
> which require super user privilege, even though that happens to be the case
> right now. It's intended as just a public API for HDFS-specific operations.
>
> Regardless, my point is not necessarily that these operations should go
> into the HdfsAdmin class, but rather that they shouldn't go into the
> FileSystem class, since the snapshots API doesn't seem to me like it will
> generalize to other FileSystem implementations.
>
>
Agreed. The cases of WAFL/ZFS were brought up -- in those file systems,
even if users may take snapshots, they're done using FS-specific APIs
rather than any standard Linux interface. So, I'm in favor of either
putting the APIs in HdfsAdmin, or alternatively in DistributedFileSystem,
forcing a user to down-cast if they want to use the HDFS-specific operation.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-04-08 Thread Todd Lipcon
+1 for the branch merge. I've reviewed all of the code in the branch, and
we have people now running this code in production scenarios. It is as
functional as the old version and way easier to set up/configure.

-Todd

On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe  wrote:

> Hi all,
>
> I think it's time to merge the HDFS-347 branch back to trunk.  It's been
> under
> review and testing for several months, and provides both a performance
> advantage, and the ability to use short-circuit local reads without
> compromising system security.
>
> Previously, we tried to merge this and the objection was brought up that we
> should keep the old, insecure short-circuit local reads around so that
> platforms for which secure SCR had not yet been implemented could use it
> (e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
>  Suresh has also volunteered to maintain the insecure SCR code until secure
> SCR can be implemented for Windows.
>
> Please cast your vote by EOD Monday 4/8.
>
> best,
> Colin
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-04-04 Thread Todd Lipcon
On Thu, Apr 4, 2013 at 9:11 PM, Tsz Wo Sze  wrote:

> Colin,
>
> We usually conclude the last VOTE before starting a new one.  Otherwise,
> people may be confused between the VOTEs.  (In case you don't know our
> convention.  Please check with someone before starting a VOTE.  Thanks.)
>

>
> -1
> * The previous VOTE started by Colin has not been concluded.
>

I can't tell if you're being serious about this... April fools was a few
days ago. This is ridiculous - the previous vote was called 2/17 and
explicitly said it was ending on 2/24. Do you think anyone's confused about
which vote is active a month and a half later?


>
> * The branch is not ready.  The code misuses DataTransferProtocol.
> Documentation of the new conf properties are missing.  Also, the code in
> the branch needs to be polished.  See HDFS-347 and HDFS-4661 for more
> details.
>

During the last vote thread, both you and Suresh said you'd actively review
the changes Colin made in response to your review feedback. Then, after
Colin posted a patch to address your complaints, it sat unreviewed for a
month before I reviewed and committed it. Now, Colin calls another vote,
and you find more nit picks in the branch, which again are not new code and
have been there for months.

I don't see how you can possibly think this is a reasonable way of going
about your duties as a reviewer of the branch, nor why you are voting -1
due to a few small nits in the codebase. Actions like these limit the
growth of our contributor base and discourage others from joining our
development community -- I for one am quite impressed with Colin's patience
throughout this ridiculous ordeal, but many others wouldn't have the same
fortitude.

If you find issues with the branch, put up a patch and let's get on with
it. This back-and-forthing is wasting all of our time.

Todd



>
> 
>  From: Colin McCabe 
> To: hdfs-dev@hadoop.apache.org
> Sent: Tuesday, April 2, 2013 7:32 AM
> Subject: VOTE: HDFS-347 merge
>
> Hi all,
>
> I think it's time to merge the HDFS-347 branch back to trunk.  It's been
> under
> review and testing for several months, and provides both a performance
> advantage, and the ability to use short-circuit local reads without
> compromising system security.
>
> Previously, we tried to merge this and the objection was brought up that we
> should keep the old, insecure short-circuit local reads around so that
> platforms for which secure SCR had not yet been implemented could use it
> (e.g. Windows).  This has been addressed-- see HDFS-4538 for details.
> Suresh has also volunteered to maintain the insecure SCR code until secure
> SCR can be implemented for Windows.
>
> Please cast your vote by EOD Monday 4/8.
>
> best,
> Colin
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Exception with QJM HDFS HA

2013-03-31 Thread Todd Lipcon
This looks like a bug with the new inode ID code in trunk, rather than a
bug with QJM or HA.

Suresh/Brandon, any thoughts?

-Todd

On Sun, Mar 31, 2013 at 6:43 PM, Azuryy Yu  wrote:

> Hi All,
>
> I configured HDFS Ha using source code from trunk r1463074.
>
> I got an exception as follows when I put a file to the HDFS.
>
> 13/04/01 09:33:45 WARN retry.RetryInvocationHandler: Exception while
> invoking addBlock of class ClientNamenodeProtocolTranslatorPB. Trying to
> fail over immediately.
> 13/04/01 09:33:45 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.FileNotFoundException: ID mismatch. Request id and saved id: 1073 ,
> 1050
> at
> org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:51)
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2501)
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2298)
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2212)
> at
>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:498)
> at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
> at
>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40979)
> at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:526)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1018)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1818)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1814)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1812)
>
>
> please reproduce as :
>
> hdfs dfs -put test.data  /user/data/test.data
> after this command start to run, then kill active name node process.
>
>
> I have only three nodes(A,B,C) for test
> A and B are name nodes.
> B and C are data nodes.
> ZK deployed on A, B and C.
>
> A, B and C are all journal nodes.
>
> Thanks.
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Resolved] (HDFS-4617) warning while purging logs with QJM enabled

2013-03-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4617.
---

Resolution: Duplicate

ATM points out that I already found this bug 3 months ago... resolving as 
duplicate with HDFS-4298

> warning while purging logs with QJM enabled
> ---
>
> Key: HDFS-4617
> URL: https://issues.apache.org/jira/browse/HDFS-4617
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Affects Versions: 2.0.3-alpha
>    Reporter: Todd Lipcon
>
> HDFS-2946 changed the way that edit log purging is calculated, such that it 
> calls selectInputStreams() with an arbitrary transaction ID calculated 
> relative to the current one. The JournalNodes will reject such a request if 
> that transaction ID falls in the middle of a segment (which it usually will). 
> This means that selectInputStreams gets an exception, and the QJM journal 
> manager is not included in this calculation. Additionally, a warning will be 
> logged.
> Purging itself still happens, because the detailed information on remote logs 
> is not necessary to calculate a retention interval, but the feature from 
> HDFS-2946 may not work as intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4538) allow use of legacy blockreader

2013-03-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4538.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to HDFS-347 branch.

> allow use of legacy blockreader
> ---
>
> Key: HDFS-4538
> URL: https://issues.apache.org/jira/browse/HDFS-4538
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4538.001.patch, HDFS-4538.002.patch, 
> HDFS-4538.003.patch, HDFS-4538.004.patch
>
>
> Some users might want to use the legacy block reader, because it is available 
> on Windows, whereas the secure solution has not yet been implemented there.  
> As described in the mailing list discussion, let's enable this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4643) Fix flakiness in TestQuorumJournalManager

2013-03-27 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4643:
-

 Summary: Fix flakiness in TestQuorumJournalManager
 Key: HDFS-4643
 URL: https://issues.apache.org/jira/browse/HDFS-4643
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: qjm, test
Affects Versions: 2.0.3-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial


TestQuorumJournalManager can occasionally fail if two consecutive test cases 
pick the same port number for the JournalNodes. In this case, sometimes an IPC 
client can be cached from a previous test case, and then fail when it tries to 
make an IPC over that cached connection to the now-broken connection. We need 
to more carefully call close() on all the QJMs to prevent this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4638) TransferFsImage should take Configuration as parameter

2013-03-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4638:
-

 Summary: TransferFsImage should take Configuration as parameter
 Key: HDFS-4638
 URL: https://issues.apache.org/jira/browse/HDFS-4638
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Priority: Minor


TransferFsImage currently creates a new HdfsConfiguration object, rather than 
taking one passed in. This means that using {{dfsadmin -fetchImage}}, you can't 
pass a different timeout on the command line, since the Tool's configuration 
doesn't get plumbed through.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4621) additional logging to help diagnose slow QJM logSync

2013-03-20 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4621:
-

 Summary: additional logging to help diagnose slow QJM logSync
 Key: HDFS-4621
 URL: https://issues.apache.org/jira/browse/HDFS-4621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.0.3-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


I've been working on diagnosing an issue with a cluster which is seeing slow 
logSync calls occasionally to QJM. Adding a few more pieces of logging would 
help this:
- in the warning messages on the client side leading up to a timeout, include 
which nodes have responded and which ones are still pending
- on the server side, when we actually call FileChannel.force, log a warning if 
the sync takes longer than 1 second

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4618) default for checkpoint txn interval is too low

2013-03-19 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4618:
-

 Summary: default for checkpoint txn interval is too low
 Key: HDFS-4618
 URL: https://issues.apache.org/jira/browse/HDFS-4618
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The default checkpoint interval is currently set to 40k transactions. That's 
way too low (I don't know what idiot set it to that.. oh wait, it was me...)

The old default in 1.0 is 64MB. Assuming an average of 100 bytes per txn, we 
should have the txn-count based interval default to at least 640,000. I'd like 
to change to 1M as a nice round number.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4617) warning while purging logs with QJM enabled

2013-03-19 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4617:
-

 Summary: warning while purging logs with QJM enabled
 Key: HDFS-4617
 URL: https://issues.apache.org/jira/browse/HDFS-4617
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.0.3-alpha
Reporter: Todd Lipcon


HDFS-2946 changed the way that edit log purging is calculated, such that it 
calls selectInputStreams() with an arbitrary transaction ID calculated relative 
to the current one. The JournalNodes will reject such a request if that 
transaction ID falls in the middle of a segment (which it usually will). This 
means that selectInputStreams gets an exception, and the QJM journal manager is 
not included in this calculation. Additionally, a warning will be logged.

Purging itself still happens, because the detailed information on remote logs 
is not necessary to calculate a retention interval, but the feature from 
HDFS-2946 may not work as intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Vote] Merge branch-trunk-win to trunk

2013-02-27 Thread Todd Lipcon
oop, and we are committed to helping make Hadoop a world-class
> > >>> solution that anyone can use to solve their biggest data challenges.
> > >>>
> > >>> As an immediate next step, we would like to have a discussion around
> > how
> > >>> we can ensure that the quality of the mainline Hadoop branches on
> > >>>Windows
> > >>> is maintained. To this end, we would like to get to the state where
> we
> > >>>have
> > >>> pre-checkin validation gates and nightly test runs enabled on
> Windows.
> > >>>If
> > >>> you have any suggestions around this, please do send an email.  We
> are
> > >>> committed to helping sustain the long-term quality of Hadoop on both
> > >>>Linux
> > >>> and Windows.
> > >>>
> > >>> We sincerely thank the community for their contribution and support
> so
> > >>> far. And hope to continue having a close engagement in the future.
> > >>>
> > >>> -Microsoft HDInsight Team
> > >>>
> > >>>
> > >>> -Original Message-
> > >>> From: Suresh Srinivas [mailto:sur...@hortonworks.com]
> > >>> Sent: Thursday, February 7, 2013 5:42 PM
> > >>> To: common-...@hadoop.apache.org; yarn-...@hadoop.apache.org;
> > >>> hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
> > >>> Subject: Heads up - merge branch-trunk-win to trunk
> > >>>
> > >>> The support for Hadoop on Windows was proposed in HADOOP-8079<
> > >>> https://issues.apache.org/jira/browse/HADOOP-8079> almost a year
> ago.
> > >>>The
> > >>> goal was to make Hadoop natively integrated, full-featured, and
> > >>>performance
> > >>> and scalability tuned on Windows Server or Windows Azure.
> > >>> We are happy to announce that a lot of progress has been made in this
> > >>> regard.
> > >>>
> > >>> Initial work started in a feature branch, branch-1-win, based on
> > >>>branch-1.
> > >>> The details related to the work done in the branch can be seen in
> > >>> CHANGES.txt<
> > >>>
> > >>>
> > http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES
> .
> > >>>branch-1-win.txt?view=markup
> > >>> >.
> > >>> This work has been ported to a branch, branch-trunk-win, based on
> > trunk.
> > >>> Merge patch for this is available on
> > >>> HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
> > >>> .
> > >>>
> > >>> Highlights of the work done so far:
> > >>> 1. Necessary changes in Hadoop to run natively on Windows. These
> > changes
> > >>> handle differences in platforms related to path names, process/task
> > >>> management etc.
> > >>> 2. Addition of winutils tools for managing file permissions and
> > >>>ownership,
> > >>> user group mapping, hardlinks, symbolic links, chmod, disk
> utilization,
> > >>>and
> > >>> process/task management.
> > >>> 3. Added cmd scripts equivalent to existing shell scripts
> > >>> hadoop-daemon.sh, start and stop scripts.
> > >>> 4. Addition of block placement policy implemnation to support cloud
> > >>> enviroment, more specifically Azure.
> > >>>
> > >>> We are very close to wrapping up the work in branch-trunk-win and
> > >>>getting
> > >>> ready for a merge. Currently the merge patch is passing close to 100%
> > of
> > >>> unit tests on Linux. Soon I will call for a vote to merge this branch
> > >>>into
> > >>> trunk.
> > >>>
> > >>> Next steps:
> > >>> 1. Call for vote to merge branch-trunk-win to trunk, when the work
> > >>> completes and precommit build is clean.
> > >>> 2. Start a discussion on adding Jenkins precommit builds on windows
> and
> > >>> how to integrate that with the existing commit process.
> > >>>
> > >>> Let me know if you have any questions.
> > >>>
> > >>> Regards,
> > >>> Suresh
> > >>>
> > >>>
> > >>
> > >>
> > >>--
> > >>http://hortonworks.com/download/
> > >
> >
>
>
>
> --
> http://hortonworks.com/download/
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-26 Thread Todd Lipcon
DFS-347 merge
>
> Hi all,
>
> I would like to merge the HDFS-347 branch back to trunk.  It's been under
> intensive review and testing for several months.  The branch adds a lot of
> new unit tests, and passes Jenkins as of 2/15 [1]
>
> We have tested HDFS-347 with both random and sequential workloads. The
> short-circuit case is substantially faster [2], and overall performance
> looks very good.  This is especially encouraging given that the initial
> goal of this work was to make security compatible with short-circuit local
> reads, rather than to optimize the short-circuit code path.  We've also
> stress-tested HDFS-347 on a number of clusters.
>
> This iniial VOTE is to merge only into trunk.  Just as we have done with
> our other recent merges, we will consider merging into branch-2 after the
> code has been in trunk for few weeks.
>
> Please cast your vote by EOD Sunday 2/24.
>
> best,
> Colin McCabe
>
> [1]
> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&p
> age=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comme
> nt-13579704
>
> [2]
> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&p
> age=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comme
> nt-13551755
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-20 Thread Todd Lipcon
On Wed, Feb 20, 2013 at 4:04 PM, Suresh Srinivas  wrote:
>
> HDFS-347 does not clearly state old short circuit will be removed any where
> in the jira or design. If this was made clear in the jira, this discussion
> would
> have happened much earlier than now.
>
> You seem to be taking the comments I am making the wrong way. I am
> supportive of this work. In fact as you see some of us have spent time
> testing this work and have reviewed the code.

The patches even going back as far as last September have all removed
the old code path. I sort of assumed that, if you are taking time to
review the patches, you would have noticed this... additionally,
Colin's comments on the JIRA said as much... eg:

 "The old RPC is now deprecated and will always throw an
AccessControlException, so that older clients will fall back to remote
reads."
"BlockReaderLocal: simpler implementation that uses raw FileChannel
objects. We don't need to cache anything, or make RPCs to the
DataNode."

from his 10/1/2012 patch upload. So, any patch you might have looked
at since then would have clearly removed the old code path.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-20 Thread Todd Lipcon
On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas  wrote:
>> Given that this is an optimization, and we have a ton of optimizations
>> which don't yet run on Windows, I don't think that should be
>> considered. Additionally, the Windows support has not yet been merged,
>> nor is it in any release, so this isn't a regression.
>>
>
> This is a critical functionality for HBase peformance and an optimization
> we consider
> very important to have.

Too bad it doesn't work in any of the normal installations... none of
the packages for Hadoop would allow it to work, given that the data
directories will be owned by HDFS and not world readable, and
tasks/HBase would run as an "hbase" user, which wouldn't have direct
access to the block files.

>>
>> I would be happy to review an addition to the HDFS-347 branch which
>> addresses this issue. But I don't think we should be maintaining two
>> codepaths for the sake of an optimization on a platform which is not
>> yet officially supported on trunk, especially when the old code path
>> is _insecure_ and thus unusable in most environments.
>
>
> I have to disagree. No where in the jira or the design it is explicitly
> stated that
> the old short circuit functionality is being removed. My assumption has been
> that it will not be removed.

I've tried this avenue in the past on other insecurities which were
fixed. Sorry if you were depending on insecure behavior. The project
should move on and not have 3+ ways of implementing the same thing.

> As regards "officially supported", we have been doing
> windows development for
> more than a year. In fact branch-1-win is being used by a lot users. Given
> merging it to branch-1 requires first making it available in trunk, we have
> been doing
> a lot of work in branch-trunk-win. It is almost ready to be merged as well.
>
> I am -1 on removing existing short circuit until an alternative short
> circuit similar
> to HDFS-347 on all the platforms.

Great -- are you committed to building this equivalent feature for
Windows, then? On what timeline? From my viewpoint, Windows isn't a
supported platform *right now*, so vetoing based on it seems
meritless.

BTW, the posix_fadvise based readahead is an important optimization
for many workloads, too, but from what I can tell looking at the
Windows branch, it doesn't support it. There are other places in the
Windows branch where performance is going to be worse - eg disabling
the pipelined crc32c implementation will be a 2-3x hit on that code
path.

No one has voted to merge Windows support, and if merging Windows
support means that, from now on, every _optimization_ must work on
Windows, I don't think I will be able to vote +1. The vast majority of
the community is _not_ running Windows, and I don't want to block
progress on the small number of developers who know how to program
against that platform.

If that's the axe hanging over our head with the Windows branch, then
I'm all for saying "good, keep it on a branch and don't merge it to
trunk".

I was hoping we could all work together a bit better here...
contentious merge votes like this just cause cases where different
distros diverge by merging different branches way ahead of the
upstream (eg yours with windows support, ours with 347, etc)

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-20 Thread Todd Lipcon
On Wed, Feb 20, 2013 at 3:08 PM, Tsz Wo Sze  wrote:
> The reason to keep it around is that the HDFS-347 only support Unix but not
> other OS.

Given that this is an optimization, and we have a ton of optimizations
which don't yet run on Windows, I don't think that should be
considered. Additionally, the Windows support has not yet been merged,
nor is it in any release, so this isn't a regression.

I would be happy to review an addition to the HDFS-347 branch which
addresses this issue. But I don't think we should be maintaining two
codepaths for the sake of an optimization on a platform which is not
yet officially supported on trunk, especially when the old code path
is _insecure_ and thus unusable in most environments.

Todd

>
> ____
> From: Todd Lipcon 
> To: hdfs-dev@hadoop.apache.org; Tsz Wo Sze 
> Sent: Wednesday, February 20, 2013 3:06 PM
>
> Subject: Re: VOTE: HDFS-347 merge
>
> On Wed, Feb 20, 2013 at 3:01 PM, Tsz Wo Sze  wrote:
>> Also, the patch seems to have removed the existing short-circuit read
>> feature (HDFS-2246).  It is an incompatible change.  I think the patch is
>> farther away from being ready and I would keep my -1.
>
> The existing short circuit feature is insecure and was always
> considered a stop-gap solution. If you read the history of that
> feature, you can find comments like
> https://issues.apache.org/jira/browse/HDFS-4476 where I pointed out
> that it's only a stop-gap solution and the only reason I didn't veto
> is that folks agreed to later replace it with the proper solution
> (HDFS-347).
>
> Given that the API is the same, and this is an implementation detail,
> it is not incompatible. There is no reason to keep the old
> implementation around: it is both slower and unusable in the vast
> majority of clusters, where the data directories are owned by an HDFS
> user, and users of the cluster run under other unix credentials.
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-20 Thread Todd Lipcon
On Wed, Feb 20, 2013 at 3:06 PM, Todd Lipcon  wrote:
> https://issues.apache.org/jira/browse/HDFS-4476 where I pointed out

sorry, meant to link to:
https://issues.apache.org/jira/browse/HDFS-2246?focusedCommentId=13102013&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13102013
clipboard fail...

-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-20 Thread Todd Lipcon
On Wed, Feb 20, 2013 at 3:01 PM, Tsz Wo Sze  wrote:
> Also, the patch seems to have removed the existing short-circuit read feature 
> (HDFS-2246).  It is an incompatible change.  I think the patch is farther 
> away from being ready and I would keep my -1.

The existing short circuit feature is insecure and was always
considered a stop-gap solution. If you read the history of that
feature, you can find comments like
https://issues.apache.org/jira/browse/HDFS-4476 where I pointed out
that it's only a stop-gap solution and the only reason I didn't veto
is that folks agreed to later replace it with the proper solution
(HDFS-347).

Given that the API is the same, and this is an implementation detail,
it is not incompatible. There is no reason to keep the old
implementation around: it is both slower and unusable in the vast
majority of clusters, where the data directories are owned by an HDFS
user, and users of the cluster run under other unix credentials.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-20 Thread Todd Lipcon
On Wed, Feb 20, 2013 at 2:49 PM, Suresh Srinivas  wrote:
> Todd,
>
> Some of us have been trying to help test and review the code. However you
> might have missed the following, which has resulted in the review not
> completing:
>
> 02/06/13 - After intent for merge was sent, I posted comment saying
> consolidate patch has extraneous changes. That was non trivial amount of
> extraneous changes.

This was just an error with the "consolidate merge patch". Like I said
in the previous email, these patches are just for Jenkins QA to run
on, and I assume that any HDFS committer is able to look at the branch
itself to understand the changes in it. It's easy to accidentally end
up with extraneous changes when you try to generate these merge
patches - eg the same thing happened to you earlier this week on
HADOOP-8562 if I'm not mistaken.

> 02/06/13 - Nicholas posted some comments and also indicated previous
> unaddressed comments.
Colin addressed this feedback on 2/8 in
https://issues.apache.org/jira/browse/HDFS-4476 . Nicholas chose not
to review the changes (though acknowledged the JIRA on 2/6), so Aaron
committed it a week later.

> 02/15/13 - No update was made to consolidated patch. I stopped reviewing it
> waiting for the new patch. A new patch gets posted on 2/15 and soon after
> merge vote email on 2/17/13 during the long weekend.
>

Again, all the changes are in the branch. Again I can't imagine trying
to review a merge of a branch by looking at a 400KB patch. They're
just there to trigger Jenkins. It should be the responsibility of
committers to look at the branch itself. Or if you prefer a single
patch, it's trivial to generate one in your local repo.

> At this time, some of the comments that were made earlier have not been
> addressed. Also folks who were reviewing the consolidated patch have not
> posted +1.

It seems like the best way to trigger people actually reviewing
branches is to call merge votes. I would have hoped that people would
review the work as it went along. If we waited for a +1 without
calling a merge vote, this would drag on for months and months. This
is based on my experience with the 3 or 4 branches I've worked on.

>
> I think we should wait for +1 for the merge patch (from the folks actively
> reviewing the patch) before the merge vote. That might make this process
> smoother. But  I agree, if the changes are deemed to be trivial, we can do
> it post merge to trunk.

The difficulty is defining "actively reviewing the patch". Making 3 or
4 cursory comments once every 2 weeks doesn't look like "active
review" to me. On the other hand, I spent probably 40-50% of my time
over the last month reviewing and testing this branch and have voted
+1.

-Todd

>
>
> On Wed, Feb 20, 2013 at 12:16 PM, Todd Lipcon  wrote:
>
>> Hi Nicholas,
>>
>> I looked at your comments on the JIRA, and they all seem like trivial
>> things that could be addressed post-merge, and none of them would
>> affect the functionality. If Colin addresses these issues, will you
>> amend your vote to +1 within the called-for voting period?
>>
>> It concerns me that we've been asking for reviews on this branch for
>> multiple months now, and yet you're only bringing up some of these
>> things now that a merge vote is called. Colin sentp a note to this
>> list a month ago (http://markmail.org/message/phcfc3watwlqiemw) saying
>> that the merge was coming soon. Since then, we found a few small bugs
>> around the configuration/setup code, but all of the things you're
>> bringing up in the review now have been in the branch since the new
>> year, so I feel like there has been quite ample time for review.
>>
>> -Todd
>>
>> On Wed, Feb 20, 2013 at 11:56 AM, Tsz Wo Sze  wrote:
>> > -1
>> > The patch seems not ready yet.  I have posted some comments/suggestions
>> on the JIRA.  Colin also has agreed that there are some bugs to be fixed.
>>  Sorry.
>> >
>> > Tsz-Wo
>> >
>> >
>> >
>> >
>> > 
>> >  From: Todd Lipcon 
>> > To: hdfs-dev@hadoop.apache.org
>> > Sent: Tuesday, February 19, 2013 4:11 PM
>> > Subject: Re: VOTE: HDFS-347 merge
>> >
>> > +1 (binding)
>> >
>> > I code-reviewed almost all of the code in this branch, and also spent
>> some
>> > time benchmarking and testing under various workloads. We've also done
>> > significant testing on clusters here at Cloudera, both secure and
>> insecure,
>> > and verified integration with a number of other ecosystem components (eg
>> > Pig, Hive, Impala, HBase, MR, etc

Re: VOTE: HDFS-347 merge

2013-02-20 Thread Todd Lipcon
The point isn't when the consolidated patches were posted to the JIRA.
The branch is on the public SVN and has been for months. The work was
done incrementally on the branch, and you were welcome (and
encouraged) to review it all along. The consolidated patches are
mostly there in order to get a "Hadoop QA" run against the branch,
since we can't currently run the QA bot on anything but trunk.

-Todd

On Wed, Feb 20, 2013 at 1:48 PM, Tsz Wo Sze  wrote:
> The previous patch of HDFS-347 was posted on Jan 31
> (2013.01.31.consolidated2.patch)  I have tried to review it but the code is
> quite unreadable at that time.  Then, the next patch is the latest patch
> 2013.02.15.consolidated4.patch posted in the evening of Feb 15, right before
> the weekends.  As mentioned previously, I did not get a chance to check it
> until yesterday (Feb 19).
>
> The currently patch is still not yet ready.  It seems to have unnecessarily
> changed the API and protocol.  I believe those are important but not trivial
> things.
>
> Tsz-Wo
>
>
> 
> From: Todd Lipcon 
> To: hdfs-dev@hadoop.apache.org; Tsz Wo Sze 
> Sent: Wednesday, February 20, 2013 12:16 PM
>
> Subject: Re: VOTE: HDFS-347 merge
>
> Hi Nicholas,
>
> I looked at your comments on the JIRA, and they all seem like trivial
> things that could be addressed post-merge, and none of them would
> affect the functionality. If Colin addresses these issues, will you
> amend your vote to +1 within the called-for voting period?
>
> It concerns me that we've been asking for reviews on this branch for
> multiple months now, and yet you're only bringing up some of these
> things now that a merge vote is called. Colin sentp a note to this
> list a month ago (http://markmail.org/message/phcfc3watwlqiemw) saying
> that the merge was coming soon. Since then, we found a few small bugs
> around the configuration/setup code, but all of the things you're
> bringing up in the review now have been in the branch since the new
> year, so I feel like there has been quite ample time for review.
>
> -Todd
>
> On Wed, Feb 20, 2013 at 11:56 AM, Tsz Wo Sze  wrote:
>> -1
>> The patch seems not ready yet.  I have posted some comments/suggestions on
>> the JIRA.  Colin also has agreed that there are some bugs to be fixed.
>> Sorry.
>>
>> Tsz-Wo
>>
>>
>>
>>
>> 
>>  From: Todd Lipcon 
>> To: hdfs-dev@hadoop.apache.org
>> Sent: Tuesday, February 19, 2013 4:11 PM
>> Subject: Re: VOTE: HDFS-347 merge
>>
>> +1 (binding)
>>
>> I code-reviewed almost all of the code in this branch, and also spent some
>> time benchmarking and testing under various workloads. We've also done
>> significant testing on clusters here at Cloudera, both secure and
>> insecure,
>> and verified integration with a number of other ecosystem components (eg
>> Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and
>> should provide much better performance for a number of workloads,
>> especially in secure environments.
>>
>> Thanks for the hard work, Colin!
>>
>> -Todd
>>
>> On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe
>> wrote:
>>
>>> Hi all,
>>>
>>> I would like to merge the HDFS-347 branch back to trunk.  It's been
>>> under intensive review and testing for several months.  The branch
>>> adds a lot of new unit tests, and passes Jenkins as of 2/15 [1]
>>>
>>> We have tested HDFS-347 with both random and sequential workloads. The
>>> short-circuit case is substantially faster [2], and overall
>>> performance looks very good.  This is especially encouraging given
>>> that the initial goal of this work was to make security compatible
>>> with short-circuit local reads, rather than to optimize the
>>> short-circuit code path.  We've also stress-tested HDFS-347 on a
>>> number of clusters.
>>>
>>> This iniial VOTE is to merge only into trunk.  Just as we have done
>>> with our other recent merges, we will consider merging into branch-2
>>> after the code has been in trunk for few weeks.
>>>
>>> Please cast your vote by EOD Sunday 2/24.
>>>
>>> best,
>>> Colin McCabe
>>>
>>> [1]
>>>
>>> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704
>>>
>>> [2]
>>>
>>> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-20 Thread Todd Lipcon
Hi Nicholas,

I looked at your comments on the JIRA, and they all seem like trivial
things that could be addressed post-merge, and none of them would
affect the functionality. If Colin addresses these issues, will you
amend your vote to +1 within the called-for voting period?

It concerns me that we've been asking for reviews on this branch for
multiple months now, and yet you're only bringing up some of these
things now that a merge vote is called. Colin sentp a note to this
list a month ago (http://markmail.org/message/phcfc3watwlqiemw) saying
that the merge was coming soon. Since then, we found a few small bugs
around the configuration/setup code, but all of the things you're
bringing up in the review now have been in the branch since the new
year, so I feel like there has been quite ample time for review.

-Todd

On Wed, Feb 20, 2013 at 11:56 AM, Tsz Wo Sze  wrote:
> -1
> The patch seems not ready yet.  I have posted some comments/suggestions on 
> the JIRA.  Colin also has agreed that there are some bugs to be fixed.  Sorry.
>
> Tsz-Wo
>
>
>
>
> ____
>  From: Todd Lipcon 
> To: hdfs-dev@hadoop.apache.org
> Sent: Tuesday, February 19, 2013 4:11 PM
> Subject: Re: VOTE: HDFS-347 merge
>
> +1 (binding)
>
> I code-reviewed almost all of the code in this branch, and also spent some
> time benchmarking and testing under various workloads. We've also done
> significant testing on clusters here at Cloudera, both secure and insecure,
> and verified integration with a number of other ecosystem components (eg
> Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and
> should provide much better performance for a number of workloads,
> especially in secure environments.
>
> Thanks for the hard work, Colin!
>
> -Todd
>
> On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe wrote:
>
>> Hi all,
>>
>> I would like to merge the HDFS-347 branch back to trunk.  It's been
>> under intensive review and testing for several months.  The branch
>> adds a lot of new unit tests, and passes Jenkins as of 2/15 [1]
>>
>> We have tested HDFS-347 with both random and sequential workloads. The
>> short-circuit case is substantially faster [2], and overall
>> performance looks very good.  This is especially encouraging given
>> that the initial goal of this work was to make security compatible
>> with short-circuit local reads, rather than to optimize the
>> short-circuit code path.  We've also stress-tested HDFS-347 on a
>> number of clusters.
>>
>> This iniial VOTE is to merge only into trunk.  Just as we have done
>> with our other recent merges, we will consider merging into branch-2
>> after the code has been in trunk for few weeks.
>>
>> Please cast your vote by EOD Sunday 2/24.
>>
>> best,
>> Colin McCabe
>>
>> [1]
>> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704
>>
>> [2]
>> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: VOTE: HDFS-347 merge

2013-02-19 Thread Todd Lipcon
+1 (binding)

I code-reviewed almost all of the code in this branch, and also spent some
time benchmarking and testing under various workloads. We've also done
significant testing on clusters here at Cloudera, both secure and insecure,
and verified integration with a number of other ecosystem components (eg
Pig, Hive, Impala, HBase, MR, etc). The feature works as advertised and
should provide much better performance for a number of workloads,
especially in secure environments.

Thanks for the hard work, Colin!

-Todd

On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe wrote:

> Hi all,
>
> I would like to merge the HDFS-347 branch back to trunk.  It's been
> under intensive review and testing for several months.  The branch
> adds a lot of new unit tests, and passes Jenkins as of 2/15 [1]
>
> We have tested HDFS-347 with both random and sequential workloads. The
> short-circuit case is substantially faster [2], and overall
> performance looks very good.  This is especially encouraging given
> that the initial goal of this work was to make security compatible
> with short-circuit local reads, rather than to optimize the
> short-circuit code path.  We've also stress-tested HDFS-347 on a
> number of clusters.
>
> This iniial VOTE is to merge only into trunk.  Just as we have done
> with our other recent merges, we will consider merging into branch-2
> after the code has been in trunk for few weeks.
>
> Please cast your vote by EOD Sunday 2/24.
>
> best,
> Colin McCabe
>
> [1]
> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704
>
> [2]
> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755
>



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Resolved] (HDFS-4496) DFSClient: don't create a domain socket unless we need it

2013-02-12 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4496.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

> DFSClient: don't create a domain socket unless we need it
> -
>
> Key: HDFS-4496
> URL: https://issues.apache.org/jira/browse/HDFS-4496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4496.001.patch
>
>
> If we don't have conf.domainSocketDataTraffic or conf.shortCircuitLocalReads 
> set, the client shouldn't create a domain socket because we couldn't use it.  
> This is only an issue if you misconfigure things, but it's still good to fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4486) Add log category for long-running DFSClient notices

2013-02-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4486:
-

 Summary: Add log category for long-running DFSClient notices
 Key: HDFS-4486
 URL: https://issues.apache.org/jira/browse/HDFS-4486
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Priority: Minor


There are a number of features in the DFS client which are transparent but can 
make a fairly big difference for performance -- two in particular are short 
circuit reads and native checksumming. Because we don't want log spew for 
clients like "hadoop fs -cat" we currently log only at DEBUG level when these 
features are disabled. This makes it difficult to troubleshoot/verify for 
long-running perf-sensitive clients like HBase.

One simple solution is to add a new log category - eg 
o.a.h.h.DFSClient.PerformanceAdvisory - which long-running clients could enable 
at DEBUG level without getting the full debug spew.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4485) HDFS-347: DN should chmod socket path a+w

2013-02-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4485:
-

 Summary: HDFS-347: DN should chmod socket path a+w
 Key: HDFS-4485
 URL: https://issues.apache.org/jira/browse/HDFS-4485
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
Priority: Critical


In cluster-testing HDFS-347, we found that in clusters where the MR job doesn't 
run as the same user as HDFS, clients wouldn't use short circuit read because 
of a 'permission denied' error connecting to the socket. It turns out that, in 
order to connect to a socket, clients need write permissions on the socket file.

The DN should set these permissions automatically after it creates the socket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Release numbering for branch-2 releases

2013-02-04 Thread Todd Lipcon
On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas wrote:

>
> Why? Can you please share some reasons?
>
> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
My issue with the current scheme is that there is little definition as to
what alpha/beta/stable means. We're trying to boil down a complex issue
into a simple tag which doesn't well capture the various subtleties. For
example, different people may variously use the terms to describe:

- Quality/completeness: for example, missing docs, buggy UIs, difficult
setup/install, etc
- Safety: for example, potential bugs which may risk data loss
- Stability: for example, potential bugs which may risk uptime
- End-user API compatibility: will user-facing APIs change in this version?
(affecting those who write MR jobs)
- Framework-developer API compatibility: will YARN-internal APIs change in
this version? (affecting those who write non-MR YARN frameworks)
- Binary compatibility: can I continue to use my application (or YARN)
framework compiled against an old version with this version, without a
recompile?
- Intra-cluster wire compatibility: can I rolling-upgrade from A to B?
- Client-server wire compatibility: can I use old clients to talk to an
upgraded cluster?

Depending on the user's expectations and needs, different factors above may
be significantly more or less important. And different portions of the
software may have different levels of stability in each of the areas. As
I've mentioned in previous threads, my experiences supporting production
Hadoop 1.x and Hadoop 2.x HDFS clusters has led me to believe that 2.x,
while being "alpha" is significantly less prone to data loss bugs than 1.x
in Hadoop. But, with some of the changes in the proposed 2.0.3-alpha, it
wouldn't be wire-protocol-stable.

How can we best devise a scheme that explains the various factors above in
a more detailed way than one big red warning sticker? What of the above
factors does the community think would be implied by "GA?"

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Resolved] (HDFS-4433) make TestPeerCache not flaky

2013-01-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4433.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

> make TestPeerCache not flaky
> 
>
> Key: HDFS-4433
> URL: https://issues.apache.org/jira/browse/HDFS-4433
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4433.001.patch
>
>
> TestPeerCache is flaky now because it relies on using the same global cache 
> for every test function.  So the cache timeout can't be set to something 
> different for each test.
> Also, we should implement equals and hashCode for {{FakePeer}}, since 
> otherwise {{testMultiplePeersWithSameDnId}} is not really testing what 
> happens when multiple equal peers are inserted into the cache.  (The default 
> equals is object equality).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4416) change dfs.datanode.domain.socket.path to dfs.domain.socket.path

2013-01-21 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4416.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks, Colin.

> change dfs.datanode.domain.socket.path to dfs.domain.socket.path
> 
>
> Key: HDFS-4416
> URL: https://issues.apache.org/jira/browse/HDFS-4416
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4416.001.patch, HDFS-4416.002.patch, 
> HDFS-4416.003.patch, HDFS-4416.004.patch
>
>
> {{dfs.datanode.domain.socket.path}} is used by both clients and the DataNode, 
> so it might be best to avoid putting 'datanode' in the name.  Most of the 
> configuration keys that have 'datanode' in the name apply only to the DN.
> Also, should change __PORT__ to _PORT to be consistent with _HOST, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4418) HDFS-347: increase default FileInputStreamCache size

2013-01-17 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4418.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch.

> HDFS-347: increase default FileInputStreamCache size
> 
>
> Key: HDFS-4418
> URL: https://issues.apache.org/jira/browse/HDFS-4418
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>    Reporter: Todd Lipcon
>    Assignee: Todd Lipcon
> Attachments: hdfs-4418.txt
>
>
> The FileInputStreamCache currently defaults to holding only 10 input stream 
> pairs (corresponding to 10 blocks). In many HBase workloads, the region 
> server will be issuing random reads against a local file which is 2-4GB in 
> size or even larger (hence 20+ blocks).
> Given that the memory usage for caching these input streams is low, and 
> applications like HBase tend to already increase their ulimit -n 
> substantially (eg up to 32,000), I think we should raise the default cache 
> size to 50 or more. In the rare case that someone has an application which 
> uses local reads with hundreds of open blocks and can't feasibly raise their 
> ulimit -n, they can lower the limit appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4418) HDFS-347: increase default FileInputStreamCache size

2013-01-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4418:
-

 Summary: HDFS-347: increase default FileInputStreamCache size
 Key: HDFS-4418
 URL: https://issues.apache.org/jira/browse/HDFS-4418
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The FileInputStreamCache currently defaults to holding only 10 input stream 
pairs (corresponding to 10 blocks). In many HBase workloads, the region server 
will be issuing random reads against a local file which is 2-4GB in size or 
even larger (hence 20+ blocks).

Given that the memory usage for caching these input streams is low, and 
applications like HBase tend to already increase their ulimit -n substantially 
(eg up to 32,000), I think we should raise the default cache size to 50 or 
more. In the rare case that someone has an application which uses local reads 
with hundreds of open blocks and can't feasibly raise their ulimit -n, they can 
lower the limit appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4417:
-

 Summary: HDFS-347: fix case where local reads get disabled 
incorrectly
 Key: HDFS-4417
 URL: https://issues.apache.org/jira/browse/HDFS-4417
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon


In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
following case:
- a workload is running which puts a bunch of local sockets in the PeerCache
- the workload abates for a while, causing the sockets to go "stale" (ie the DN 
side disconnects after the keepalive timeout)
- the workload starts again

In this case, the local socket retrieved from the cache failed the 
newBlockReader call, and it incorrectly disabled local sockets on that host. 
This is similar to an earlier bug HDFS-3376, but not quite the same.

The next issue we ran into is that, once this happened, it never tried local 
sockets again, because the cache held lots of TCP sockets. Since we always 
managed to get a cached socket to the local node, it didn't bother trying local 
read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-14 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4403:
-

 Summary: DFSClient can infer checksum type when not provided by 
reading first byte
 Key: HDFS-4403
 URL: https://issues.apache.org/jira/browse/HDFS-4403
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the new 
protobuf field is optional, with a default of CRC32. This means that this API, 
when used against an older cluster (like earlier 0.23 releases) will falsely 
return CRC32 even if that cluster has written files with CRC32C. This can cause 
issues for distcp, for example.

Instead of defaulting the protobuf field to CRC32, we can leave it with no 
default, and if the OpBlockChecksumResponseProto has no checksum type set, the 
client can send OP_READ_BLOCK to read the first byte of the block, then grab 
the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4402) some small DomainSocket fixes: avoid findbugs warning, change log level, etc.

2013-01-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4402.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch, thanks.

> some small DomainSocket fixes: avoid findbugs warning, change log level, etc.
> -
>
> Key: HDFS-4402
> URL: https://issues.apache.org/jira/browse/HDFS-4402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4402.001.patch, HDFS-4402.002.patch
>
>
> Some miscellaneous fixes:
> * findbugs complains about a short-circuit operator in {{DomainSocket.java}} 
> for some reason.  We don't need it (it doesn't help optimization since the 
> expressions lack side-effects), so let's ditch it to avoid the findbugs 
> warning.
> * change the log level of one error message to warn
> * BlockReaderLocal should use a BufferedInputStream to read the metadata file 
> header, to avoid doing multiple small reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4401) Fix bug in DomainSocket path validation

2013-01-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4401.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks for the fix.

> Fix bug in DomainSocket path validation
> ---
>
> Key: HDFS-4401
> URL: https://issues.apache.org/jira/browse/HDFS-4401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4401.001.patch
>
>
> DomainSocket path validation currently does not validate the second-to-last 
> path component.  This leads to insecure socket paths being accepted.  It 
> should validate all path components prior to the final one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4400) DFSInputStream#getBlockReader: last retries should ignore the cache

2013-01-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4400.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch

> DFSInputStream#getBlockReader: last retries should ignore the cache
> ---
>
> Key: HDFS-4400
> URL: https://issues.apache.org/jira/browse/HDFS-4400
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4400.001.patch
>
>
> In {{DFSInputStream#getBlockReader}}, the last tries to get a {{BlockReader}} 
> should ignore the cache.  This was broken by HDFS-4356, it seems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4390) Bypass UNIX domain socket unit tests when they cannot be run

2013-01-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4390.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch

> Bypass UNIX domain socket unit tests when they cannot be run
> 
>
> Key: HDFS-4390
> URL: https://issues.apache.org/jira/browse/HDFS-4390
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: _06.patch
>
>
> Testing revealed that the existing mechanisms for bypassing UNIX domain 
> socket-related tests when they are not available are inadequate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4388) DomainSocket should throw AsynchronousCloseException when appropriate

2013-01-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4388.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks.

> DomainSocket should throw AsynchronousCloseException when appropriate
> -
>
> Key: HDFS-4388
> URL: https://issues.apache.org/jira/browse/HDFS-4388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Trivial
> Attachments: _05a.patch
>
>
> {{DomainSocket}} should throw {{AsynchronousCloseException}} when appropriate 
> (i.e., when an {{accept}} or other blocking operation is interrupted by a 
> concurrent close.)  This is nicer than throwing a generic {{IOException}} or 
> {{SocketException}}.
> Similarly, we should well throw {{ClosedChannelException}} when an operation 
> is attempted on a closed {{DomainSocket}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4356) BlockReaderLocal should use passed file descriptors rather than paths

2013-01-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4356.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

> BlockReaderLocal should use passed file descriptors rather than paths
> -
>
> Key: HDFS-4356
> URL: https://issues.apache.org/jira/browse/HDFS-4356
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: 04b-cumulative.patch, _04b.patch, _04c.patch, 
> 04-cumulative.patch, 04d-cumulative.patch, _04e.patch, 04f-cumulative.patch, 
> _04f.patch, 04g-cumulative.patch, _04g.patch
>
>
> {{BlockReaderLocal}} should use file descriptors passed over UNIX domain 
> sockets rather than paths.  We also need some configuration options for these 
> UNIX domain sockets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4380) Opening a file for read before writer writes a block causes NPE

2013-01-09 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4380:
-

 Summary: Opening a file for read before writer writes a block 
causes NPE
 Key: HDFS-4380
 URL: https://issues.apache.org/jira/browse/HDFS-4380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Todd Lipcon


JD Cryans found this issue: it seems like, if you open a file for read 
immediately after it's been created by the writer, after a block has been 
allocated, but before the block is created on the DNs, then you can end up with 
the following NPE:

java.lang.NullPointerException
   at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
   at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858)
   at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1834)
   at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)

This seems to be because {{getBlockInfo}} returns a null block when the DN 
doesn't yet have the replica. The client should probably either fall back to a 
different replica or treat it as zero-length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-4352) Encapsulate arguments to BlockReaderFactory in a class

2013-01-08 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HDFS-4352:
---


By the way, I find it _very_ rude to close someone else's ticket as "Invalid" 
or "Wont fix" without waiting for the discussion to end. Just because you don't 
like a change doesn't give you license to do this.

> Encapsulate arguments to BlockReaderFactory in a class
> --
>
> Key: HDFS-4352
> URL: https://issues.apache.org/jira/browse/HDFS-4352
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
> Attachments: 01b.patch, 01.patch
>
>
> Encapsulate the arguments to BlockReaderFactory in a class to avoid having to 
> pass around 10+ arguments to a few different functions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4324) Track and report out-of-date blocks separately from corrupt blocks

2012-12-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4324:
-

 Summary: Track and report out-of-date blocks separately from 
corrupt blocks
 Key: HDFS-4324
 URL: https://issues.apache.org/jira/browse/HDFS-4324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Currently in various places (metrics, dfsadmin -report, fsck, logs) we use the 
term "corrupt" to refer to blocks which have an out-of-date generation stamp. 
Since out-of-date blocks are a fairly normal occurrence if a DN restarts while 
data is being written, we should be avoid using 'scary' works like _corrupt_. 
This may need both some textual changes as well as some internal changes to 
count the corruption types distinctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4305) Add a configurable limit on number of blocks per file, and min block size

2012-12-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4305:
-

 Summary: Add a configurable limit on number of blocks per file, 
and min block size
 Key: HDFS-4305
 URL: https://issues.apache.org/jira/browse/HDFS-4305
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.2-alpha, 1.0.4, 3.0.0
Reporter: Todd Lipcon
Priority: Minor


We recently had an issue where a user set the block size very very low and 
managed to create a single file with hundreds of thousands of blocks. This 
caused problems with the edit log since the OP_ADD op was so large (HDFS-4304). 
I imagine it could also cause efficiency issues in the NN. To prevent users 
from making such mistakes, we should:
- introduce a configurable minimum block size, below which requests are rejected
- introduce a configurable maximum number of blocks per file, above which 
requests to add another block are rejected (with a suitably high default as to 
not prevent legitimate large files)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4304) Make FSEditLogOp.MAX_OP_SIZE configurable

2012-12-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4304:
-

 Summary: Make FSEditLogOp.MAX_OP_SIZE configurable
 Key: HDFS-4304
 URL: https://issues.apache.org/jira/browse/HDFS-4304
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe


Today we ran into an issue where a NN had logged a very large op, greater than 
the 1.5MB MAX_OP_SIZE constant. In order to successfully load the edits, we had 
to patch with a larger constant. This constant should be configurable so that 
we wouldn't have to recompile in these odd cases. Additionally, I think the 
default should be bumped a bit higher, since it's only a safeguard against 
OOME, and people tend to run NNs with multi-GB heaps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4301) 2NN image transfer timeout problematic

2012-12-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4301:
-

 Summary: 2NN image transfer timeout problematic
 Key: HDFS-4301
 URL: https://issues.apache.org/jira/browse/HDFS-4301
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical


HDFS-1490 added a timeout on image transfer. But, it seems like the timeout is 
actually applying to the entirety of the image transfer operation. So, if the 
image or edits are large (multiple GB) or the transfer is heavily throttled, it 
is likely to time out repeatedly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4300) TransferFsImage.downloadEditsToStorage should use a tmp file for destination

2012-12-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4300:
-

 Summary: TransferFsImage.downloadEditsToStorage should use a tmp 
file for destination
 Key: HDFS-4300
 URL: https://issues.apache.org/jira/browse/HDFS-4300
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical


Currently, in TransferFsImage.downloadEditsToStorage, we download the edits 
file directly to its finalized path. So, if the transfer fails in the middle, a 
half-written file is left and cannot be distinguished from a correct file. So, 
future checkpoints by the 2NN will fail, since the file is truncated in the 
middle -- but it won't ever download a good copy because it thinks it already 
has the proper file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4298) StorageRetentionManager spews warnings when used with QJM

2012-12-10 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4298:
-

 Summary: StorageRetentionManager spews warnings when used with QJM
 Key: HDFS-4298
 URL: https://issues.apache.org/jira/browse/HDFS-4298
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Todd Lipcon
Assignee: Aaron T. Myers


When the NN is configured with a QJM, we see the following warning message 
every time a checkpoint is made or uploaded:
12/12/10 16:07:52 WARN namenode.FSEditLog: Unable to determine input streams 
from QJM to [127.0.0.1:13001, 127.0.0.1:13002, 127.0.0.1:13003]. Skipping.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions 
to achieve quorum size 2/3. 3 exceptions thrown:
127.0.0.1:13002: Asked for firstTxId 114837 which is in the middle of file 
/tmp/jn-2/myjournal/current/edits_0095185-0114846
...

This is because, since HDFS-2946, the NN calls {{selectInputStreams}} to 
determine the number of log segments and put a cap on the number. This API 
throws an exception in the case of QJM if the argument falls in the middle of 
an edit log boundary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4110) Refine JNStorage log

2012-12-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4110.
---

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha

Committed backport to branch-2 (same patch applied)

> Refine JNStorage log
> 
>
> Key: HDFS-4110
> URL: https://issues.apache.org/jira/browse/HDFS-4110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: liang xie
>Assignee: liang xie
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-4110.txt
>
>
> Abstract class Storage has a toString method: 
> {quote}
> return "Storage Directory " + this.root;
> {quote}
> and in the subclass JNStorage we could see:
> {quote}
> LOG.info("Formatting journal storage directory " + 
> sd + " with nsid: " + getNamespaceID());
> {quote}
> that'll print sth like "Formatting journal storage directory Storage 
> Directory x"
> Just one line change to:
> {quota}
> LOG.info("Formatting journal " + sd + " with nsid: " + getNamespaceID());
> {quota}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-4110) Refine JNStorage log

2012-12-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HDFS-4110:
---


Reopening to backport to branch-2

> Refine JNStorage log
> 
>
> Key: HDFS-4110
> URL: https://issues.apache.org/jira/browse/HDFS-4110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: liang xie
>Assignee: liang xie
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HDFS-4110.txt
>
>
> Abstract class Storage has a toString method: 
> {quote}
> return "Storage Directory " + this.root;
> {quote}
> and in the subclass JNStorage we could see:
> {quote}
> LOG.info("Formatting journal storage directory " + 
> sd + " with nsid: " + getNamespaceID());
> {quote}
> that'll print sth like "Formatting journal storage directory Storage 
> Directory x"
> Just one line change to:
> {quota}
> LOG.info("Formatting journal " + sd + " with nsid: " + getNamespaceID());
> {quota}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-12-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3077.
---

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha

Committed backport to branch-2. Thanks for looking at the backport patch, 
Andrew and Aaron.

> Quorum-based protocol for reading and writing edit logs
> ---
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ha, namenode
>    Reporter: Todd Lipcon
>    Assignee: Todd Lipcon
> Fix For: 3.0.0, QuorumJournalManager (HDFS-3077), 2.0.3-alpha
>
> Attachments: hdfs-3077-branch-2.txt, hdfs-3077-partial.txt, 
> hdfs-3077-test-merge.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, 
> hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, 
> qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, 
> qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, 
> qjournal-design.tex, qjournal-design.tex
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   6   7   8   9   10   >