[jira] [Created] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test

2015-02-17 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-7807:
--

 Summary: libhdfs htable.c: fix htable resizing, add unit test
 Key: HDFS-7807
 URL: https://issues.apache.org/jira/browse/HDFS-7807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: native
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


libhdfs htable.c: fix htable resizing, add unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Theory question: good values for FileStatus.getBlockSize()

2015-02-17 Thread Colin P. McCabe
In the past, "block size" and "size of block N" were completely
separate concepts in HDFS.

The former was often referred to as "default block size" or "preferred
block" size or some such thing.  Basically it was the point at which
we'd call it a day and move on to the next block, whenever any block
got to that point.  "default block size" was pretty much always 128MB
or 256MB in Real Clusters (although sometimes Apache Parquet would set
it as high as 1GB).  We got tired of people configuring ridiculously
small block sizes by accident so HDFS-4305 added
dfs.namenode.fs-limits.min-block-size.

In the old world, the only block which could be smaller than the
"default block size" was the final block of a file.  MR used default
block size as a guide to doing partitioning and we sort of ignored the
fact that the last block could be less than that.

Now that HDFS-3689 has been added to branch-2, it is no longer true
that all the blocks are the same size except the last one.  The
ramifications of this are still to be determined.  dfs.blocksize will
still be an upper bound on block size, but it will no longer be a
lower bound.

To answer your specific question: in HDFS, FileStatus#getBlockSize
will return the "preferred block size," not the size of any specific
block.  So it's totally possible that none of the blocks in the file
actually have the size returned in FileStatus#getBlockSize.

The relevant code is here in FSDirectory.java:
> if (node.isFile()) {
>   final INodeFile fileNode = node.asFile();
>   size = fileNode.computeFileSize(snapshot);
>   replication = fileNode.getFileReplication(snapshot);
>   blocksize = fileNode.getPreferredBlockSize();
>   isEncrypted = (feInfo != null) ||
>  (isRawPath && isInAnEZ(INodesInPath.fromINode(node)));
> } else {
>  isEncrypted = isInAnEZ(INodesInPath.fromINode(node));
> }
...
> return new HdfsFileStatus(
> ...
>   blocksize,
> ...
>   );

Probably s3 and the rest of the alternative FS gang should just return
the value of some configuration variable (possibly fs.local.block.size
or dfs.blocksize?).  Even though "preferred block size" is a
completely bogus concept in s3, MapReduce and other frameworks still
use it to calculate splits.  Since s3 never does local reads anyway,
there is no reason to prefer any block size over any other, except in
terms of dividing up the work.

regards,
Colin

On Mon, Feb 16, 2015 at 9:44 AM, Steve Loughran  wrote:
>
> HADOOP-11601 tightens up the filesystem spec by saying "if len(file) > 0, 
> getFileStatus().getBlockSize() > 0"
>
> this is to stop filesystems (most recently s3a) returning 0 as a block size, 
> which then kills any analytics work that tries to partition the workload by 
> blocksize.
>
> I'm currently changing the markdown text to say
>
> MUST be >0 for a file size >0
> MAY be 0 for a file of size==0.
>
> + the relevant tests to check this.
>
> There's one thing I do want to understand from HDFS first: what about small 
> files.? That is: what does HDFS return as a blocksize if a file is smaller 
> than its block size?
>
> -Steve


Re: Datanode synchronization is horrible. I’m thinking we can use ReentrantReadWriteLock for synchronization. What do you guys think?

2015-02-17 Thread Todd Lipcon
You might also be interested in
https://issues.apache.org/jira/browse/HDFS-1148 which I worked on a bit a
number of years back. Per the last comment in that JIRA, I don't think it's
very valuable anymore given the predominance of short-circuit reads in high
performance workloads these days. If you've got some jstacks showing high
contention on these locks under some workload, though, it would be
interesting to see them.

-Todd

On Tue, Feb 17, 2015 at 2:18 PM, Colin P. McCabe  wrote:

> In general, the DN does not perform reads from files under a big lock.
> We only need the lock for protecting the replica map and some of the
> block state.  This lock hasn't really been a big problem in the past
> and I would hesitate to add complexity here (although I haven't
> thought about it that hard at all, so maybe I'm wrong!)
>
> Are you sure that you are not hitting HDFS-7489?
>
> In general, the client normally does some readahead of a few kb to
> avoid swamping the DN with tons of tiny requests.  Tons of tiny
> requests is a bad idea for many other reasons (RPC overhead, seek
> overhead, etc. etc.)
>
> You can also look into using short-circuit reads to avoid the DataNode
> overhead altogether for local reads, which a lot of high-performance
> systems do.
>
> regards,
> Colin
>
> On Sat, Feb 14, 2015 at 10:43 PM, Sukunhui (iEBP) 
> wrote:
> > I have a cluster writes/reads/deletes lots of small files.
> > I dump the stack of one Datenode and found out that Datanode has more
> than 100+ sessions for reading/writing blocks. 100+ DataXceiver threads
> waiting to lock <0x7f9b26ce9530> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> >
> > I find that DsDatasetImpl.java and ReplicaMap.java use a lot of
> `synchronized` keyword for synchronization. It’s horrible.
> > First, locking for every reading is unnecessary, and deceases
> concurrency.
> > Second, Java monitors (synchronized/await/notify/notifyAll) are
> non-fair. (
> http://stackoverflow.com/questions/11275699/synchronized-release-order),
> It will causes many dfsclient timeout.
> >
> > I’m thinking we can use ReentrantReadWriteLock for synchronization. What
> do you guys think?
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Datanode synchronization is horrible. I’m thinking we can use ReentrantReadWriteLock for synchronization. What do you guys think?

2015-02-17 Thread Colin P. McCabe
In general, the DN does not perform reads from files under a big lock.
We only need the lock for protecting the replica map and some of the
block state.  This lock hasn't really been a big problem in the past
and I would hesitate to add complexity here (although I haven't
thought about it that hard at all, so maybe I'm wrong!)

Are you sure that you are not hitting HDFS-7489?

In general, the client normally does some readahead of a few kb to
avoid swamping the DN with tons of tiny requests.  Tons of tiny
requests is a bad idea for many other reasons (RPC overhead, seek
overhead, etc. etc.)

You can also look into using short-circuit reads to avoid the DataNode
overhead altogether for local reads, which a lot of high-performance
systems do.

regards,
Colin

On Sat, Feb 14, 2015 at 10:43 PM, Sukunhui (iEBP)  wrote:
> I have a cluster writes/reads/deletes lots of small files.
> I dump the stack of one Datenode and found out that Datanode has more than 
> 100+ sessions for reading/writing blocks. 100+ DataXceiver threads waiting to 
> lock <0x7f9b26ce9530> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>
> I find that DsDatasetImpl.java and ReplicaMap.java use a lot of  
> `synchronized` keyword for synchronization. It’s horrible.
> First, locking for every reading is unnecessary, and deceases concurrency.
> Second, Java monitors (synchronized/await/notify/notifyAll) are non-fair. 
> (http://stackoverflow.com/questions/11275699/synchronized-release-order), It 
> will causes many dfsclient timeout.
>
> I’m thinking we can use ReentrantReadWriteLock for synchronization. What do 
> you guys think?


Re: 2.7 status

2015-02-17 Thread Colin McCabe
+1 for starting thinking about releasing 2.7 soon.

Re: building Windows binaries.  Do we release binaries for all the
Linux and UNIX architectures?  I thought we didn't.  It seems a little
inconsistent to release binaries just for Windows, but not for those
other architectures and OSes.  I wonder if we can improve this
situation?

best,
Colin

On Fri, Feb 13, 2015 at 4:36 PM, Karthik Kambatla  wrote:
> 2 weeks from now (end of Feb) sounds reasonable. The one feature I would
> like for to be included is shared-cache: we are pretty close - two more
> main items to take care of.
>
> In an offline conversation, Steve mentioned building Windows binaries for
> our releases. Do we want to do that for 2.7? If so, can anyone with Windows
> expertise setup a Jenkins job to build these artifacts, and may be hook it
> up to https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/
>
>
>
> On Fri, Feb 13, 2015 at 11:07 AM, Arun Murthy  wrote:
>
>> My bad, been sorted distracted.
>>
>> I agree, we should just roll fwd a 2.7 ASAP with all the goodies.
>>
>> What sort of timing makes sense? 2 week hence?
>>
>> thanks,
>> Arun
>>
>> 
>> From: Jason Lowe 
>> Sent: Friday, February 13, 2015 8:11 AM
>> To: common-...@hadoop.apache.org
>> Subject: Re: 2.7 status
>>
>> I'd like to see a 2.7 release sooner than later.  It has been almost 3
>> months since Hadoop 2.6 was released, and there have already been 634 JIRAs
>> committed to 2.7.  That's a lot of changes waiting for an official release.
>>
>> https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2Chdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolution%3DFixed
>> Jason
>>
>>   From: Sangjin Lee 
>>  To: "common-...@hadoop.apache.org" 
>>  Sent: Tuesday, February 10, 2015 1:30 PM
>>  Subject: 2.7 status
>>
>> Folks,
>>
>> What is the current status of the 2.7 release? I know initially it started
>> out as a "java-7" only release, but looking at the JIRAs that is very much
>> not the case.
>>
>> Do we have a certain timeframe for 2.7 or is it time to discuss it?
>>
>> Thanks,
>> Sangjin
>>
>>
>>
>
>
> --
> Karthik Kambatla
> Software Engineer, Cloudera Inc.
> 
> http://five.sentenc.es


[jira] [Created] (HDFS-7806) Refactor: move StorageType.java from hadoop-hdfs to hadoop-common

2015-02-17 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-7806:


 Summary: Refactor: move StorageType.java from hadoop-hdfs to 
hadoop-common
 Key: HDFS-7806
 URL: https://issues.apache.org/jira/browse/HDFS-7806
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Xiaoyu Yao
Priority: Minor


We need to migrate the StorageType definition from hadoop-hdfs 
(org.apache.hadoop.hdfs) to hadoop-common(org.apache.hadoop.fs) because the 
ContentSummary and FileSystem#getContentSummary() in org.apache.hadoop.fs 
package needs to be enhanced with the storage type quota amount and usage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Block creation in HDFS

2015-02-17 Thread Gangumalla, Uma
Hi,

HDFS does store the data how you writing to it. It will not organize the data. 
HDFS has flexibility I terms of placements.
If you want to write in this fashion bunch of blocks should be allocated once 
and client should write all of them based on you portion. Which is sounding 
something similar to striping approach. Which is not supported right now in 
HDFS. It is being developed with erasure codes branch. HDFS-7285. Correct me if 
I misunderstood ur needs here.

Regards,
Uma

-Original Message-
From: Abhishek Das [mailto:abhishek.b...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:14 PM
To: hdfs-dev
Subject: Re: Block creation in HDFS

Hi,

Thanks Vinay for your response. I dont need blocks of variable size. But 
setting only the block size probably wont help in my case. Let me give an 
example to explain what I am trying to do.

Let say the main file has 12 integers 1 to 12. The block size is such that each 
block will have 3 integers. Now if I ask hdfs to create the blocks, it would 
create 4 blocks - first one would have 1-3, second one would have 4-7. 
According to my requirement, the data in the main file is partitioned into 3 
clusters. (1,2,3,4), (5,6,7,8) and (9,10,11,12). Now when the blocks will be 
created, I need data from all partitions get represented in each block. So in 
this case, the first block would have (1,5,9), second one would have (2,6,10) 
etc... So i want to change how the data is allocated in each of the blocks.

Is it feasible to change  the default block creation policy in current 
implementation?

Regards,
Abhishek Das

On Tue, Feb 17, 2015 at 2:25 AM, Vinayakumar B 
wrote:

> Hi abhishek,
> Is Your partitions of same sizes? If yes, then you can set that as 
> block size.
>
> If not you can use the latest feature.. variable block size.
> To verify your use case.
> You can close the current block after each partition data is written 
> and append to new block for new partition data.
> This feature is not yet available in any of the release. Hope to see 
> in future 2.7 release. As of now you can verify in any of the 
> trunk/branch-2 builds.
>
> Hope this helps.
>
> -Vinay
> On Feb 17, 2015 8:30 AM, "Abhishek Das"  wrote:
>
> > Hi,
> >
> > I am new in this group. I had a question regarding block creation in
> HDFS.
> > By default the file is split into multiple blocks of size equal to 
> > block size. I need to introduce new block creation policy into the 
> > system. In
> my
> > case the main file is divided into multiple partitions. My goal is 
> > to create the blocks where data is represented from each partition 
> > of the file. Is it possible to introduce the new policy ? If yes, 
> > what would the starting point in the code I should look at.
> >
> > Regards,
> > Abhishek Das
> >
>


Re: Block creation in HDFS

2015-02-17 Thread Abhishek Das
Hi,

Thanks Vinay for your response. I dont need blocks of variable size. But
setting only the block size probably wont help in my case. Let me give an
example to explain what I am trying to do.

Let say the main file has 12 integers 1 to 12. The block size is such that
each block will have 3 integers. Now if I ask hdfs to create the blocks, it
would create 4 blocks - first one would have 1-3, second one would have
4-7. According to my requirement, the data in the main file is partitioned
into 3 clusters. (1,2,3,4), (5,6,7,8) and (9,10,11,12). Now when the blocks
will be created, I need data from all partitions get represented in each
block. So in this case, the first block would have (1,5,9), second one
would have (2,6,10) etc... So i want to change how the data is allocated in
each of the blocks.

Is it feasible to change  the default block creation policy in current
implementation?

Regards,
Abhishek Das

On Tue, Feb 17, 2015 at 2:25 AM, Vinayakumar B 
wrote:

> Hi abhishek,
> Is Your partitions of same sizes? If yes, then you can set that as block
> size.
>
> If not you can use the latest feature.. variable block size.
> To verify your use case.
> You can close the current block after each partition data is written and
> append to new block for new partition data.
> This feature is not yet available in any of the release. Hope to see in
> future 2.7 release. As of now you can verify in any of the trunk/branch-2
> builds.
>
> Hope this helps.
>
> -Vinay
> On Feb 17, 2015 8:30 AM, "Abhishek Das"  wrote:
>
> > Hi,
> >
> > I am new in this group. I had a question regarding block creation in
> HDFS.
> > By default the file is split into multiple blocks of size equal to block
> > size. I need to introduce new block creation policy into the system. In
> my
> > case the main file is divided into multiple partitions. My goal is to
> > create the blocks where data is represented from each partition of the
> > file. Is it possible to introduce the new policy ? If yes, what would the
> > starting point in the code I should look at.
> >
> > Regards,
> > Abhishek Das
> >
>


Re: max concurrent connection to HDFS name node

2015-02-17 Thread Colin P. McCabe
Hi Demai,

Nearly all input and output stream operations will talk directly to
the DN without involving the NN.  The NameNode is involved in metadata
operations such as renaming or opening files, not in reading data.

Hope this helps.

best,
Colin


On Thu, Feb 12, 2015 at 4:21 PM, Demai Ni  wrote:
> Colin,
>
> Thanks. 30~50K is smaller than I thought, through I understand that I
> shouldn't stress the traffic unnecessarily.
>
> If I can put my client(java/c) on a datanode and only read the local hdfs
> files, that is the files have their replicas on such datanode. Is there an
> API I can use to talk directly to DN, without stressing NN?  Thanks
>
> Demai
>
> On Thu, Feb 12, 2015 at 2:05 PM, Colin McCabe 
> wrote:
>
>> The NN can do somewhere around 30,000 - 50,000 RPCs per second
>> currently, depending on configuration.  In general you do not want to
>> have extremely high NN RPC traffic, because it will slow things down.
>> You might consider re-architecting your application to do more DN
>> traffic and less NN traffic, if possible.  Hope that helps.
>>
>> best,
>> Colin
>>
>> On Tue, Feb 10, 2015 at 4:29 PM, Demai Ni  wrote:
>> > hi, folks,
>> >
>> > Is there a max limit of concurrent connection to a name node? or whether
>> > there is a best practice?
>> >
>> > My scenario is simple. Client(java/c++) program will open a connection
>> > through hdfs api call, and then open a few hdfs files, maybe read a bit
>> > data, then close the connection. In some case, the number of clients may
>> > be  50,000~100,000 concurrently. Is the number of connection acceptable?
>> >
>> > Thanks.
>> >
>> > Demai
>>


Jenkins build is back to normal : Hadoop-Hdfs-trunk #2039

2015-02-17 Thread Apache Jenkins Server
See 



Re: Block creation in HDFS

2015-02-17 Thread Vinayakumar B
Hi abhishek,
Is Your partitions of same sizes? If yes, then you can set that as block
size.

If not you can use the latest feature.. variable block size.
To verify your use case.
You can close the current block after each partition data is written and
append to new block for new partition data.
This feature is not yet available in any of the release. Hope to see in
future 2.7 release. As of now you can verify in any of the trunk/branch-2
builds.

Hope this helps.

-Vinay
On Feb 17, 2015 8:30 AM, "Abhishek Das"  wrote:

> Hi,
>
> I am new in this group. I had a question regarding block creation in HDFS.
> By default the file is split into multiple blocks of size equal to block
> size. I need to introduce new block creation policy into the system. In my
> case the main file is divided into multiple partitions. My goal is to
> create the blocks where data is represented from each partition of the
> file. Is it possible to introduce the new policy ? If yes, what would the
> starting point in the code I should look at.
>
> Regards,
> Abhishek Das
>


Jenkins build is back to normal : Hadoop-Hdfs-trunk-Java8 #97

2015-02-17 Thread Apache Jenkins Server
See