hadoop cluster with mixed servers(different memory, speed, etc)

2015-09-17 Thread Demai Ni
hi, folks, I am wondering how hadoop cluster handle commodity hardware with different speed, capacity . This situation is happening and probably become very common soon. That a cluster starts with 100 machines, and in a couple years, add another 100 machines. With Moore's law as an indicator,

Re: HDFS ShortCircuit Read on Mac?

2015-09-08 Thread Demai Ni
> HADOOP-11957 if you want to see the current progress. > > --Chris Nauroth > > From: Demai Ni <nid...@gmail.com> > Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org> > Date: Tuesday, September 8, 2015 at 4:46 PM > To: "user@hadoop.apache.

HDFS ShortCircuit Read on Mac?

2015-09-08 Thread Demai Ni
hi, folks, wondering anyone has setup HDFS shortcircuit Read on Mac? I installed hadoop through homebrew on Mac. It is up and running. But I cannot config "dfs.domain.socket.path" as instructed here: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html

hadoop/hdfs cache question, do client processes share cache?

2015-08-11 Thread Demai Ni
hi, folks, I have a quick question about how hdfs handle cache? In this lab experiment, I have a 4 node hadoop cluster (2.x) and each node has a fair large memory (96GB). And have a single hdfs file with 256MB, and also fit in one HDFS block. The local filesystem is linux. Now from one of the

Re: hadoop/hdfs cache question, do client processes share cache?

2015-08-11 Thread Demai Ni
Ritesh, many thanks for your response. I just read through the centralized Cache document. Thanks for the pointer. A couple follow-up questions. First, the centralized cache required 'explicit' configuration, so by default, there is no HDFS-managed cache? Will the cache occur at local filesystem

Re: a non-commerial distribution of hadoop ecosystem?

2015-06-01 Thread Demai Ni
/installation info of the ODP. maybe I should google harder? :-) Demai On Mon, Jun 1, 2015 at 1:46 PM, Roman Shaposhnik r...@apache.org wrote: On Mon, Jun 1, 2015 at 1:37 PM, Demai Ni nid...@gmail.com wrote: My question is besides the commercial distributions: CDH(Cloudera) , HDP (Horton work

Re: Connect c language with HDFS

2015-05-04 Thread Demai Ni
I would also suggest to take a look at https://issues.apache.org/jira/browse/HDFS-6994. I have been using libhdfs3 for POC in past few months, and highly recommend it. the only drawback is the libhdfs3 has not been formed committed into hadoop/hdfs yet. if you only like to play with hdfs, using

Re: Data locality

2015-03-02 Thread Demai Ni
hi, folks, I have the similar question. Is there an easy way to tell(from a user perspective) whether short circuit is enabled? thanks Demai On Mon, Mar 2, 2015 at 11:46 AM, Fei Hu hufe...@gmail.com wrote: Hi All, I developed a scheduler for data locality. Now I want to test the

[HDFS] result order of getFileBlockLocations() and listFiles()?

2014-10-29 Thread Demai Ni
hi, Guys, I am trying to implement a simple program(that is not for production, experimental). And invoke FileSystem.listFiles() to get a list of files under a hdfs folder, and then use FileSystem.getFileBlockLocations() to get replica locations of each file/blocks. Since it is a controlled

Re: read from a hdfs file on the same host as client

2014-10-13 Thread Demai Ni
. This would allow the client to bypass the datanode to read the file directly. On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni nid...@gmail.com wrote: hi, folks, a very simple question, looking forward a couple pointers. Let's say I have a hdfs file: testfile, which only have one block(256MB

hdfs: a C API call to getFileSize() through libhdfs or libhdfs3?

2014-10-02 Thread Demai Ni
hi, folks, To get the size of a hdfs file, jave API has FileSystem#getFileStatus(PATH)#getLen(); now I am trying to use a C client to do the same thing. For a file on local file system, I can grab the info like this: fseeko(file, 0, SEEK_END); size = ftello(file); But I can't find the SEEK_END

Re: Planning to propose Hadoop initiative to company. Need some inputs please.

2014-10-01 Thread Demai Ni
hi, glad to see another person moving from mainframe world to the 'big' data one. I was in the same boat a few years back after working on mainframe for 10+ years. Wilm got to the pointers already. I'd like to just chime in a bit from mainframe side. The example of website usage is a very good

Re: conf.get(dfs.data.dir) return null when hdfs-site.xml doesn't set it explicitly

2014-09-09 Thread Demai Ni
On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni nid...@gmail.com wrote: hi, Bhooshan, thanks for your kind response. I run the code on one of the data node of my cluster, with only one hadoop daemon running. I believe my java client code connect to the cluster correctly as I am able

Re: conf.get(dfs.data.dir) return null when hdfs-site.xml doesn't set it explicitly

2014-09-08 Thread Demai Ni
properties are defined. Since this is a CDH cluster, you would probably be best served by asking on the CDH mailing list as to where the right path to these files is. HTH, Bhooshan On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni nid...@gmail.com wrote: hi, experts, I am trying to get the local

Re: conf.get(dfs.data.dir) return null when hdfs-site.xml doesn't set it explicitly

2014-09-08 Thread Demai Ni
probably find out on the Cloudera mailing list. HTH, Bhooshan On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni nid...@gmail.com wrote: hi, Bhooshan, thanks for your kind response. I run the code on one of the data node of my cluster, with only one hadoop daemon running. I believe my java client

Re: question about matching java API with libHDFS

2014-09-04 Thread Demai Ni
(). getBlockLocations(...).. Is the libhdfs designed to limit the access? Thanks Demai On Thu, Sep 4, 2014 at 2:36 AM, Liu, Yi A yi.a@intel.com wrote: You could refer to the header file: “src/main/native/libhdfs/hdfs.h”, you could get the APIs in detail. Regards, Yi Liu *From:* Demai Ni

question about matching java API with libHDFS

2014-09-03 Thread Demai Ni
hi, folks, I am currently using java to access HDFS. for example, I am using this API DFSclient.getNamenode().getBlockLocations(...)... to retrieve file block information. Now I need to move the same logic into C/C++. so I am looking at libHDFS, and this wiki page:

Re: Local file system to access hdfs blocks

2014-08-29 Thread Demai Ni
; these names are private to teh HDFS system and user should not use them, right? But if you really want ot know this, you can check the fsck code to see whether they are available; On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni nid...@gmail.com wrote: Stanley and all, thanks. I will write

Re: Local file system to access hdfs blocks

2014-08-28 Thread Demai Ni
of the block (on which DN), but there's no way to get the local address of that block file. On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni nid...@gmail.com wrote: Yehia, No problem at all. I really appreciate your willingness to help. Yeah. now I am able to get such information through two steps

Re: Local file system to access hdfs blocks

2014-08-27 Thread Demai Ni
such info in already? Demai on the run On Aug 26, 2014, at 9:14 PM, Stanley Shi s...@pivotal.io wrote: I am not sure this is what you want but you can try this shell command: find [DATANODE_DIR] -name [blockname] On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni nid...@gmail.com wrote: Hi, folks

Re: Local file system to access hdfs blocks

2014-08-27 Thread Demai Ni
. Hope it helps. Yehia On 27 August 2014 20:18, Demai Ni nid...@gmail.com wrote: Hi, Stanley, Many thanks. Your method works. For now, I can have two steps approach: 1) getFileBlockLocations to grab hdfs BlockLocation[] 2) use local file system call(like find command) to match the block

Local file system to access hdfs blocks

2014-08-25 Thread Demai Ni
Hi, folks, New in this area. Hopefully to get a couple pointers. I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3) I am wondering whether there is a interface to get each hdfs block information in the term of local file system. For example, I can use Hadoop fsck