RandomSampler and TotalOrderPartitioner

2014-08-28 Thread Keren Ouaknine
Hello,

I am running a global sort (on Pigmix input data, size 600GB) based
on TotalOrderPartitioner. The best practice according to the literature
points to data sampling using RandomSampler. The query succeeds but takes a
very long time (7 hours) and that's because there is only one reducer
(which nullifies the point of using the above classes :) ).
I am trying to figure out what forces the # of reducers to be *one*, as I
defined them to be* 400*. I looked into the documentation and in the code
of RandomSampler, there is a requirement which says:

// Set the path to the SequenceFile storing the sorted partition keyset. It
must be the case that for R reduces, there are R-1 keys in the SequenceFile.


​And therefore I sampled as follows:

*InputSampler.SamplerText, Text sampler =new
InputSampler.RandomSamplerText, Text(0.9, 399, 444);​*

Looking into my _partition file I can see there is only one partition which
explains the one reducer:
SEQ org.apache.hadoop.io.Text!org.apache.hadoop.io.NullWritable

I am wondering how come the partition file contains only one sample, though
I asked for 399 samples above?

Thanks for the help!!
Keren​


-- 
Keren Ouaknine
www.kereno.com


Re: Local file system to access hdfs blocks

2014-08-28 Thread Stanley Shi
As far as I know, there's no combination of hadoop API can do that.
You can easily get the location of the block (on which DN), but there's no
way to get the local address of that block file.



On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni nid...@gmail.com wrote:

 Yehia,

 No problem at all. I really appreciate your willingness to help. Yeah. now
 I am able to get such information through two steps, and the first step
 will be either hadoop fsck or getFileBlockLocations(). and then search
 the local filesystem, my cluster is using the default from CDH, which is
 /dfs/dn

 I would like to it programmatically, so wondering whether someone already
 done it? or maybe better a hadoop API call already implemented for this
 exact purpose

 Demai


 On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater y.z.elsha...@gmail.com
 wrote:

 Hi Demai,

 Sorry, I missed that you are already tried this out. I think you can
 construct the block location on the local file system if you have the block
 pool id and the block id. If you are using cloudera distribution, the
 default location is under /dfs/dn ( the value of dfs.data.dir,
 dfs.datanode.data.dir configuration keys).

 Thanks
 Yehia


 On 27 August 2014 21:20, Yehia Elshater y.z.elsha...@gmail.com wrote:

 Hi Demai,

 You can use fsck utility like the following:

 hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks

 This will display all the information you need about the blocks of your
 file.

 Hope it helps.
 Yehia


 On 27 August 2014 20:18, Demai Ni nid...@gmail.com wrote:

 Hi, Stanley,

 Many thanks. Your method works. For now, I can have two steps approach:
 1) getFileBlockLocations to grab hdfs BlockLocation[]
 2) use local file system call(like find command) to match the block to
 files on local file system .

 Maybe there is an existing Hadoop API to return such info in already?

 Demai on the run

 On Aug 26, 2014, at 9:14 PM, Stanley Shi s...@pivotal.io wrote:

 I am not sure this is what you want but you can try this shell command:

 find [DATANODE_DIR] -name [blockname]


 On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni nid...@gmail.com wrote:

 Hi, folks,

 New in this area. Hopefully to get a couple pointers.

 I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)

 I am wondering whether there is a interface to get each hdfs block
 information in the term of local file system.

 For example, I can use Hadoop fsck /tmp/test.txt -files -blocks
 -racks to get blockID and its replica on the nodes, such as: repl =3[
 /rack/hdfs01, /rack/hdfs02...]

  With such info, is there a way to
 1) login to hfds01, and read the block directly at local file system
 level?


 Thanks

 Demai on the run




 --
 Regards,
 *Stanley Shi,*







-- 
Regards,
*Stanley Shi,*


Re: What happens when .....?

2014-08-28 Thread Stanley Shi
Normally MR job is used for batch processing. So I don't think this is a
good use case here for MR.
Since you need to run the program periodically, you cannot submit a single
mapreduce job for this.
An possible way is to create a cron job to scan the folder size and submit
a MR job if necessary;



On Wed, Aug 27, 2014 at 7:38 PM, Kandoi, Nikhil nikhil.kan...@emc.com
wrote:

 Hi All,



 I have a system where files are coming in hdfs at regular intervals and I
 perform an operation everytime the directory size goes above a particular
 point.

 My Question is that when I submit a map reduce job, would it only work on
 the files present at that point ??



 Regards,
 Nikhil Kandoi






-- 
Regards,
*Stanley Shi,*


Re: Appending to HDFS file

2014-08-28 Thread Stanley Shi
You should not use this method:
FSDataOutputStream fp = fs.create(pt, true)

Here's the java doc for this create method:

  /**

   * Create an FSDataOutputStream at the indicated Path.

   * @param f the file to create

   * @*param** overwrite if a file with this name already exists, then if
true,*

   *   the file will be overwritten, and if false an exception will be
thrown.

   */

  public FSDataOutputStream create(Path f, boolean overwrite)

  throws IOException {

return create(f, overwrite,

  getConf().getInt(io.file.buffer.size, 4096),

  getDefaultReplication(f),

  getDefaultBlockSize(f));

  }


On Wed, Aug 27, 2014 at 2:12 PM, rab ra rab...@gmail.com wrote:


 hello

 Here is d code snippet, I use to append

 def outFile = ${outputFile}.txt

 Path pt = new Path(${hdfsName}/${dir}/${outFile})

 def fs = org.apache.hadoop.fs.FileSystem.get(configuration);

 FSDataOutputStream fp = fs.create(pt, true)

 fp  ${key} ${value}\n
 On 27 Aug 2014 09:46, Stanley Shi s...@pivotal.io wrote:

 would you please past the code in the loop?


 On Sat, Aug 23, 2014 at 2:47 PM, rab ra rab...@gmail.com wrote:

 Hi

 By default, it is true in hadoop 2.4.1. Nevertheless, I have set it to
 true explicitly in hdfs-site.xml. Still, I am not able to achieve append.

 Regards
 On 23 Aug 2014 11:20, Jagat Singh jagatsi...@gmail.com wrote:

 What is value of dfs.support.append in hdfs-site.cml


 https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml




 On Sat, Aug 23, 2014 at 1:41 AM, rab ra rab...@gmail.com wrote:

 Hello,

 I am currently using Hadoop 2.4.1.I am running a MR job using hadoop
 streaming utility.

 The executable needs to write large amount of information in a file.
 However, this write is not done in single attempt. The file needs to be
 appended with streams of information generated.

 In the code, inside a loop, I open a file in hdfs, appends some
 information. This is not working and I see only the last write.

 How do I accomplish append operation in hadoop? Can anyone share a
 pointer to me?




 regards
 Bala





 --
 Regards,
 *Stanley Shi,*




-- 
Regards,
*Stanley Shi,*


libhdfs result in JVM crash issue, please help me

2014-08-28 Thread Vincent,Wei
All

I am using libhdfs, I need some usage like following ,and when the JNI call
return, it had result in some Crash in JVM, Attachment is the detail
information.

JAVA


Call

JNI



Call

C LIB



Call

Libhdfs

Crash info


#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f3271ad3159, pid=9880, tid=139854651725568
#
# JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build
1.7.0_51-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x7d6159]  Monitor::ILock(Thread*)+0x79
#
# Core dump written. Default location: /home/haduser/core or core.9880
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

---  T H R E A D  ---

Current thread is native thread

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR),
si_addr=0x0168

Registers:
RAX=0x7f326c1eb201, RBX=0x7f326c011b70, RCX=0x7f326c1eb201,
RDX=0x
RSP=0x7f3272d60e00, RBP=0x7f3272d60e20, RSI=0x,
RDI=0x7f326c011b70
R8 =0x, R9 =0x0001, R10=0x,
R11=0x0202
R12=0x, R13=0x7f3272134a60, R14=0x,
R15=0x
RIP=0x7f3271ad3159, EFLAGS=0x00010202,
CSGSFS=0x0033, ERR=0x0004
  TRAPNO=0x000e

Top of Stack: (sp=0x7f3272d60e00)
0x7f3272d60e00:   7f326c011b70 
0x7f3272d60e10:    0004
0x7f3272d60e20:   7f3272d60e40 7f3271ad34cf
0x7f3272d60e30:   7f326c0159e8 7f3272b562c0
0x7f3272d60e40:   7f3272d60e50 7f3271c98409
0x7f3272d60e50:   7f3272d60e70 7f327192c67d
0x7f3272d60e60:    7f326c0159e8
0x7f3272d60e70:   7f326c27f3e0 7f326410ea14
0x7f3272d60e80:   000500010002 7f3272105b48
0x7f3272d60e90:    0030
0x7f3272d60ea0:   7f3272d61a10 7f3272945c83
0x7f3272d60eb0:   7f3272d61700 
0x7f3272d60ec0:    7fff7a8c5670
0x7f3272d60ed0:   7f3272d619c0 
0x7f3272d60ee0:   0003 7f3272945ea8
0x7f3272d60ef0:    7f3272d61700
0x7f3272d60f00:    
0x7f3272d60f10:    
0x7f3272d60f20:    
0x7f3272d60f30:    
0x7f3272d60f40:    
0x7f3272d60f50:    
0x7f3272d60f60:    
0x7f3272d60f70:    
0x7f3272d60f80:    6a29d305af3e5c9c
0x7f3272d60f90:   7fff7a8c5670 7f3272d619c0
0x7f3272d60fa0:    0003
0x7f3272d60fb0:   944d36a9b2de5c9c 944d362d13fe5c9c
0x7f3272d60fc0:    
0x7f3272d60fd0:    
0x7f3272d60fe0:    
0x7f3272d60ff0:    7f32722573fd

Instructions: (pc=0x7f3271ad3159)
0x7f3271ad3139:   9f c6 40 80 fe 00 74 01 f0 48 0f b1 13 48 39 c1
0x7f3271ad3149:   74 d1 48 89 c1 f6 c1 01 74 d5 4c 89 f6 48 89 df
0x7f3271ad3159:   4d 8b a6 68 01 00 00 e8 5b fc ff ff 85 c0 75 b3
0x7f3271ad3169:   41 c7 44 24 20 00 00 00 00 41 83 7d 00 01 7e 05

Register to memory mapping:

RAX=0x7f326c1eb201 is an unknown value
RBX=0x7f326c011b70 is an unknown value
RCX=0x7f326c1eb201 is an unknown value
RDX=0x is an unknown value
RSP=0x7f3272d60e00 is an unknown value
RBP=0x7f3272d60e20 is an unknown value
RSI=0x is an unknown value
RDI=0x7f326c011b70 is an unknown value
R8 =0x is an unknown value
R9 =0x0001 is an unknown value
R10=0x is an unknown value
R11=0x0202 is an unknown value
R12=0x is an unknown value
R13=0x7f3272134a60: offset 0xe37a60 in
/usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so at 0x7f32712fd000
R14=0x is an unknown value
R15=0x is an unknown value


Stack: [0x7f3272c61000,0x7f3272d62000],  sp=0x7f3272d60e00,
 free space=1023k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V  [libjvm.so+0x7d6159]  Monitor::ILock(Thread*)+0x79
V  [libjvm.so+0x7d64cf]  Monitor::lock_without_safepoint_check()+0x2f
V  [libjvm.so+0x99b409]  VM_Exit::wait_if_vm_exited()+0x39
V  [libjvm.so+0x62f67d]  jni_DetachCurrentThread+0x3d
-- 
BR,

Vincent.Wei


hs_err_pid9880.log
Description: Binary data


Re: Hadoop 2.5.0 - HDFS browser-based file view

2014-08-28 Thread Stanley Shi
Normally files in HDFS are intended to be quite big, it's not very easy to
be shown in the browser.


On Fri, Aug 22, 2014 at 10:56 PM, Brian C. Huffman 
bhuff...@etinternational.com wrote:

 All,

 I noticed that that on Hadoop 2.5.0, when browsing the HDFS filesystem on
 port 50070, you can't view a file in the browser. Clicking a file gives a
 little popup with metadata and a download link. Can HDFS be configured to
 show plaintext file contents in the browser?

 Thanks,
 Brian




-- 
Regards,
*Stanley Shi,*


RE: Appending to HDFS file

2014-08-28 Thread Liu, Yi A
Right, please use FileSystem#append

From: Stanley Shi [mailto:s...@pivotal.io]
Sent: Thursday, August 28, 2014 2:18 PM
To: user@hadoop.apache.org
Subject: Re: Appending to HDFS file

You should not use this method:
FSDataOutputStream fp = fs.create(pt, true)

Here's the java doc for this create method:

  /**
   * Create an FSDataOutputStream at the indicated Path.
   * @param f the file to create
   * @param overwrite if a file with this name already exists, then if true,
   *   the file will be overwritten, and if false an exception will be thrown.
   */
  public FSDataOutputStream create(Path f, boolean overwrite)
  throws IOException {
return create(f, overwrite,
  getConf().getInt(io.file.buffer.size, 4096),
  getDefaultReplication(f),
  getDefaultBlockSize(f));
  }

On Wed, Aug 27, 2014 at 2:12 PM, rab ra 
rab...@gmail.commailto:rab...@gmail.com wrote:

hello

Here is d code snippet, I use to append

def outFile = ${outputFile}.txt

Path pt = new Path(${hdfsName}/${dir}/${outFile})

def fs = org.apache.hadoop.fs.FileSystem.get(configuration);

FSDataOutputStream fp = fs.create(pt, true)

fp  ${key} ${value}\n
On 27 Aug 2014 09:46, Stanley Shi s...@pivotal.iomailto:s...@pivotal.io 
wrote:
would you please past the code in the loop?

On Sat, Aug 23, 2014 at 2:47 PM, rab ra 
rab...@gmail.commailto:rab...@gmail.com wrote:

Hi

By default, it is true in hadoop 2.4.1. Nevertheless, I have set it to true 
explicitly in hdfs-site.xml. Still, I am not able to achieve append.

Regards
On 23 Aug 2014 11:20, Jagat Singh 
jagatsi...@gmail.commailto:jagatsi...@gmail.com wrote:
What is value of dfs.support.append in hdfs-site.cml

https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml



On Sat, Aug 23, 2014 at 1:41 AM, rab ra 
rab...@gmail.commailto:rab...@gmail.com wrote:
Hello,

I am currently using Hadoop 2.4.1.I am running a MR job using hadoop streaming 
utility.

The executable needs to write large amount of information in a file. However, 
this write is not done in single attempt. The file needs to be appended with 
streams of information generated.

In the code, inside a loop, I open a file in hdfs, appends some information. 
This is not working and I see only the last write.

How do I accomplish append operation in hadoop? Can anyone share a pointer to 
me?




regards
Bala




--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]


replication factor in hdfs-site.xml

2014-08-28 Thread Satyam Singh

Hi Users,



I want behaviour for hadoop cluster for writing/reading in following 
replication cases:


1. replication=2 and in cluster (3 datatnodes+namenode) one datanode 
gets down.
2. replication=2 and in cluster (3 datatnodes+namenode) 2 datanode gets 
down.



BR,
Satyam


AW: Running job issues

2014-08-28 Thread Blanca Hernandez
Thanks, it fixed my problem!

Von: Arpit Agarwal [mailto:aagar...@hortonworks.com]
Gesendet: Donnerstag, 28. August 2014 01:41
An: user@hadoop.apache.org
Betreff: Re: Running job issues

Susheel is right. I've fixed the typo on the wiki page.

On Wed, Aug 27, 2014 at 12:28 AM, Susheel Kumar Gadalay 
skgada...@gmail.commailto:skgada...@gmail.com wrote:
You have to use this command to format

hdfs namenode –format

not hdfs dfs -format

On 8/27/14, Blanca Hernandez 
blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.at wrote:
 Hi, thanks for your answers.

 Sorry, I forgot to add it, I couldn´t run the command neither:

 C:\development\tools\hadoop%HADOOP_PREFIX%\bin\hdfs dfs -format
 -format: Unknown command

 C:\development\tools\hadoopecho %HADOOP_PREFIX%
 C:\development\tools\hadoop

 By using –help command there is no format param.
 My Hadoop version: 2.4.0 (the latest version supported by mongo-hadoop).

 Best regards,

 Blanca


 Von: Arpit Agarwal 
 [mailto:aagar...@hortonworks.commailto:aagar...@hortonworks.com]
 Gesendet: Dienstag, 26. August 2014 21:39
 An: user@hadoop.apache.orgmailto:user@hadoop.apache.org
 Betreff: Re: Running job issues

 And the namenode does not even start: 14/08/26 12:01:09 WARN
 namenode.FSNamesystem: Encountered exception loading fsimage
 java.io.IOException: NameNode is not formatted.

 Have you formatted HDFS (step 3.4)?

 On Tue, Aug 26, 2014 at 3:08 AM, Blanca Hernandez
 blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.atmailto:blanca.hernan...@willhaben.at
 wrote:
 Hi!

 I have just installed hadoop in my windows x64 machine.l followd carefully
 the instructions in https://wiki.apache.org/hadoop/Hadoop2OnWindows but in
 the
 3.5https://wiki.apache.org/hadoop/Hadoop2OnWindows%20but%20in%20the%203.5
 and 3.6 points I have some problems I can not handle.

 %HADOOP_PREFIX%\sbin\start-dfs.cmd


 The datanode can no connect: 14/08/26 12:01:30 WARN datanode.DataNode:
 Problem connecting to server:
 0.0.0.0/0.0.0.0:19000http://0.0.0.0/0.0.0.0:19000http://0.0.0.0/0.0.0.0:19000
 And the namenode does not even start: 14/08/26 12:01:09 WARN
 namenode.FSNamesystem: Encountered exception loading fsimage
 java.io.IOException: NameNode is not formatted.

 Trying to get running the mapreduce-example indicated in the pont 3.6, I get
 the connection exception again. Tha brings me to the
 http://wiki.apache.org/hadoop/ConnectionRefused page where the exception in
 explained. So I guess I have some misunderstandings with the configuration.
 I tried to find information about that and found the page
 http://wiki.apache.org/hadoop/HowToConfigure but is still not very clear for
 me.

 I attach my config files, the ones I modified maybe you can help me out…

 Many thanks!!


 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


RE: replication factor in hdfs-site.xml

2014-08-28 Thread Liu, Yi A
For 1#, since you still have 2 datanodes alive, and the replication is 2, 
writing  will success.  (Read will success)
For 2#, now you only have 1 datanode, and the replication is 2, then initial 
writing will success, but later sometime pipeline recovery will fail. 

Regards,
Yi Liu


-Original Message-
From: Satyam Singh [mailto:satyam.si...@ericsson.com] 
Sent: Thursday, August 28, 2014 3:01 PM
To: user@hadoop.apache.org
Subject: replication factor in hdfs-site.xml

Hi Users,



I want behaviour for hadoop cluster for writing/reading in following 
replication cases:

1. replication=2 and in cluster (3 datatnodes+namenode) one datanode gets down.
2. replication=2 and in cluster (3 datatnodes+namenode) 2 datanode gets down.


BR,
Satyam


RE: Appending to HDFS file

2014-08-28 Thread rab ra
Thank you all,

It works now

Regards
rab
On 28 Aug 2014 12:06, Liu, Yi A yi.a@intel.com wrote:

  Right, please use FileSystem#append



 *From:* Stanley Shi [mailto:s...@pivotal.io]
 *Sent:* Thursday, August 28, 2014 2:18 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: Appending to HDFS file



 You should not use this method:

 FSDataOutputStream fp = fs.create(pt, true)



 Here's the java doc for this create method:



   /**

* Create an FSDataOutputStream at the indicated Path.

* @param f the file to create

* @*param overwrite if a file with this name already exists, then if
 true,*

*   the file will be overwritten, and if false an exception will be
 thrown.

*/

   public FSDataOutputStream create(Path f, boolean overwrite)

   throws IOException {

 return create(f, overwrite,

   getConf().getInt(io.file.buffer.size, 4096),

   getDefaultReplication(f),

   getDefaultBlockSize(f));

   }



 On Wed, Aug 27, 2014 at 2:12 PM, rab ra rab...@gmail.com wrote:


 hello

 Here is d code snippet, I use to append

 def outFile = ${outputFile}.txt

 Path pt = new Path(${hdfsName}/${dir}/${outFile})

 def fs = org.apache.hadoop.fs.FileSystem.get(configuration);

 FSDataOutputStream fp = fs.create(pt, true)

 fp  ${key} ${value}\n

 On 27 Aug 2014 09:46, Stanley Shi s...@pivotal.io wrote:

 would you please past the code in the loop?



 On Sat, Aug 23, 2014 at 2:47 PM, rab ra rab...@gmail.com wrote:

 Hi

 By default, it is true in hadoop 2.4.1. Nevertheless, I have set it to
 true explicitly in hdfs-site.xml. Still, I am not able to achieve append.

 Regards

 On 23 Aug 2014 11:20, Jagat Singh jagatsi...@gmail.com wrote:

 What is value of dfs.support.append in hdfs-site.cml




 https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml







 On Sat, Aug 23, 2014 at 1:41 AM, rab ra rab...@gmail.com wrote:

 Hello,



 I am currently using Hadoop 2.4.1.I am running a MR job using hadoop
 streaming utility.



 The executable needs to write large amount of information in a file.
 However, this write is not done in single attempt. The file needs to be
 appended with streams of information generated.



 In the code, inside a loop, I open a file in hdfs, appends some
 information. This is not working and I see only the last write.



 How do I accomplish append operation in hadoop? Can anyone share a pointer
 to me?









 regards

 Bala







 --

 Regards,

 *Stanley Shi,*





 --

 Regards,

 *Stanley Shi,*




Re: libhdfs result in JVM crash issue, please help me

2014-08-28 Thread Vincent,Wei
#0  0x7f1e3872c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x7f1e3872c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7f1e3872fb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x7f1e380a4405 in os::abort(bool) () from
/usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#3  0x7f1e38223347 in VMError::report_and_die() () from
/usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#4  0x7f1e380a8d8f in JVM_handle_linux_signal () from
/usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#5  signal handler called
#6  0x7f1e38066159 in Monitor::ILock(Thread*) () from
/usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#7  0x7f1e380664cf in Monitor::lock_without_safepoint_check() () from
/usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#8  0x7f1e3822b409 in VM_Exit::wait_if_vm_exited() () from
/usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#9  0x7f1e37ebf67d in jni_DetachCurrentThread () from
/usr/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
#10 0x7f1e26544a14 in hdfsThreadDestructor (v=optimized out) at
/home/haduser/hadoop-2.2.0-src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:84
#11 0x7f1e38ed8c83 in __nptl_deallocate_tsd () from
/lib/x86_64-linux-gnu/libpthread.so.0
#12 0x7f1e38ed8ea8 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x7f1e387ea3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x in ?? ()



2014-08-28 14:28 GMT+08:00 Vincent,Wei weikun0...@gmail.com:


 All

 I am using libhdfs, I need some usage like following ,and when the JNI
 call return, it had result in some Crash in JVM, Attachment is the detail
 information.

 JAVA


 Call

 JNI



 Call

 C LIB



 Call

 Libhdfs

 Crash info


 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x7f3271ad3159, pid=9880, tid=139854651725568
 #
 # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build
 1.7.0_51-b13)
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode
 linux-amd64 compressed oops)
 # Problematic frame:
 # V  [libjvm.so+0x7d6159]  Monitor::ILock(Thread*)+0x79
 #
 # Core dump written. Default location: /home/haduser/core or core.9880
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #

 ---  T H R E A D  ---

 Current thread is native thread

 siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR),
 si_addr=0x0168

 Registers:
 RAX=0x7f326c1eb201, RBX=0x7f326c011b70, RCX=0x7f326c1eb201,
 RDX=0x
 RSP=0x7f3272d60e00, RBP=0x7f3272d60e20, RSI=0x,
 RDI=0x7f326c011b70
 R8 =0x, R9 =0x0001, R10=0x,
 R11=0x0202
 R12=0x, R13=0x7f3272134a60, R14=0x,
 R15=0x
 RIP=0x7f3271ad3159, EFLAGS=0x00010202,
 CSGSFS=0x0033, ERR=0x0004
   TRAPNO=0x000e

 Top of Stack: (sp=0x7f3272d60e00)
 0x7f3272d60e00:   7f326c011b70 
 0x7f3272d60e10:    0004
 0x7f3272d60e20:   7f3272d60e40 7f3271ad34cf
 0x7f3272d60e30:   7f326c0159e8 7f3272b562c0
 0x7f3272d60e40:   7f3272d60e50 7f3271c98409
 0x7f3272d60e50:   7f3272d60e70 7f327192c67d
 0x7f3272d60e60:    7f326c0159e8
 0x7f3272d60e70:   7f326c27f3e0 7f326410ea14
 0x7f3272d60e80:   000500010002 7f3272105b48
 0x7f3272d60e90:    0030
 0x7f3272d60ea0:   7f3272d61a10 7f3272945c83
 0x7f3272d60eb0:   7f3272d61700 
 0x7f3272d60ec0:    7fff7a8c5670
 0x7f3272d60ed0:   7f3272d619c0 
 0x7f3272d60ee0:   0003 7f3272945ea8
 0x7f3272d60ef0:    7f3272d61700
 0x7f3272d60f00:    
 0x7f3272d60f10:    
 0x7f3272d60f20:    
 0x7f3272d60f30:    
 0x7f3272d60f40:    
 0x7f3272d60f50:    
 0x7f3272d60f60:    
 0x7f3272d60f70:    
 0x7f3272d60f80:    6a29d305af3e5c9c
 0x7f3272d60f90:   7fff7a8c5670 7f3272d619c0
 0x7f3272d60fa0:    0003
 0x7f3272d60fb0:   944d36a9b2de5c9c 944d362d13fe5c9c
 0x7f3272d60fc0:    
 0x7f3272d60fd0:    
 0x7f3272d60fe0:    
 0x7f3272d60ff0:    7f32722573fd

 

Error could only be replicated to 0 nodes instead of minReplication (=1)

2014-08-28 Thread Jakub Stransky
Hello,
we are using Hadoop 2.2.0 (HDP 2.0), avro 1.7.4. running on CentOS 6.3

I am facing a following issue when using a AvroMultipleOutputs with dynamic
output files. My M/R job works fine for a smaller amount of data or at
least the error hasn't appear there so far. With bigger amount of data I am
getting following error back to console:

Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/data/pif-dob-categorize/2014/08/26/14/_temporary/1/_temporary/attempt_1409147867302_0090_r_00_0/HISTORY/20140216/64619-r-0.avro
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 2 datanode(s) running and no node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2503)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)

at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1231)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)

I checked that:
- cluster is no partitioned - network is fine
- that HDFS cluster has enough capacity - actually just 8% used
- setting for dfs.reserved is set to 1 but 80GB is still free.

Our cluster is small for development purposes just to clarify.

DataNode log contains a lot of following errors:
2014-08-28 06:57:22,585 ERROR datanode.DataNode (DataXceiver.java:run(225))
- bd-prg-dev1-dn1.corp.ncr.com:50010:DataXceiver error processing
WRITE_BLOCK operation  src: /153.86.209.223:47123 dest: /
153.86.209.223:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:435)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:693)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:569)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:744)


NameNode contails planty of errors of type:

2014-08-28 06:58:19,904 WARN  blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(295)) - Not able to place
enough replicas, still in need of 2 to reach 2
For more information, please enable DEBUG log level on
org.apache.commons.logging.impl.Log4JLogger
2014-08-28 06:58:19,905 WARN  blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(295)) - Not able to place
enough replicas, still in need of 2 to reach 2
For more 

Hadoop on Windows 8 with Java 8

2014-08-28 Thread Ruebenacker, Oliver A

 Hello,

  I can't find any information on how possible or difficult it is to install 
Hadoop as a single node on Windows 8 running Oracle Java 8. The tutorial on 
Hadoop 2 on Windowshttp://wiki.apache.org/hadoop/Hadoop2OnWindows mentions 
neither Windows 8 nor Java 8. Is there anything known about this?

  Thanks!

 Best,
 Oliver

--
Oliver Ruebenacker
Solutions Architect at Altisource Labshttp://www.altisourcelabs.com/
Be always grateful, but never satisfied.

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


RE: Hadoop on Windows 8 with Java 8

2014-08-28 Thread Liu, Yi A
Currently Hadoop doesn't officially support JAVA8

Regards,
Yi Liu

From: Ruebenacker, Oliver A [mailto:oliver.ruebenac...@altisource.com]
Sent: Thursday, August 28, 2014 8:46 PM
To: user@hadoop.apache.org
Subject: Hadoop on Windows 8 with Java 8


 Hello,

  I can't find any information on how possible or difficult it is to install 
Hadoop as a single node on Windows 8 running Oracle Java 8. The tutorial on 
Hadoop 2 on Windowshttp://wiki.apache.org/hadoop/Hadoop2OnWindows mentions 
neither Windows 8 nor Java 8. Is there anything known about this?

  Thanks!

 Best,
 Oliver

--
Oliver Ruebenacker
Solutions Architect at Altisource Labshttp://www.altisourcelabs.com/
Be always grateful, but never satisfied.

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


Re: What happens when .....?

2014-08-28 Thread Eric Payne
Or, maybe have a look at Apache Falcon:
Falcon - Apache Falcon - Data management and processing platform

 
 
 
 
 
 
Falcon - Apache Falcon - Data management and processing platform
Apache Falcon - Data management and processing platform   
View on falcon.incubator.apache.org Preview by Yahoo  
 

 From: Stanley Shi s...@pivotal.io
To: user@hadoop.apache.org user@hadoop.apache.org 
Sent: Thursday, August 28, 2014 1:15 AM
Subject: Re: What happens when .?
  


Normally MR job is used for batch processing. So I don't think this is a good 
use case here for MR.
Since you need to run the program periodically, you cannot submit a single 
mapreduce job for this.  
An possible way is to create a cron job to scan the folder size and submit a 
MR job if necessary;





On Wed, Aug 27, 2014 at 7:38 PM, Kandoi, Nikhil nikhil.kan...@emc.com wrote:

Hi All,
  
I have a system where files are coming in hdfs at regular intervals and I 
perform an operation everytime the directory size goes above a particular 
point.
My Question is that when I submit a map reduce job, would it only work on the 
files present at that point ?? 
 
Regards,
Nikhil Kandoi

 
 



-- 

Regards,
Stanley Shi,
 



Re: What happens when .....?

2014-08-28 Thread Mahesh Khandewal
unsubscribe


On Thu, Aug 28, 2014 at 6:42 PM, Eric Payne eric.payne1...@yahoo.com
wrote:

 Or, maybe have a look at Apache Falcon:
 Falcon - Apache Falcon - Data management and processing platform
 http://falcon.incubator.apache.org/






 Falcon - Apache Falcon - Data management and processing platform
 http://falcon.incubator.apache.org/
 Apache Falcon - Data management and processing platform
 View on falcon.incubator.apache.org http://falcon.incubator.apache.org/
 Preview by Yahoo


*From:* Stanley Shi s...@pivotal.io
 *To:* user@hadoop.apache.org user@hadoop.apache.org
 *Sent:* Thursday, August 28, 2014 1:15 AM
 *Subject:* Re: What happens when .?

 Normally MR job is used for batch processing. So I don't think this is a
 good use case here for MR.
 Since you need to run the program periodically, you cannot submit a single
 mapreduce job for this.
 An possible way is to create a cron job to scan the folder size and submit
 a MR job if necessary;



 On Wed, Aug 27, 2014 at 7:38 PM, Kandoi, Nikhil nikhil.kan...@emc.com
 wrote:

 Hi All,

 I have a system where files are coming in hdfs at regular intervals and I
 perform an operation everytime the directory size goes above a particular
 point.
 My Question is that when I submit a map reduce job, would it only work on
 the files present at that point ??

 Regards,
 Nikhil Kandoi






 --
 Regards,
 *Stanley Shi,*






hadoop installation error: localhost: ssh: connect to host localhost port 22: connection refused

2014-08-28 Thread Li Chen
Can anyone please help me with this installation error?

After I type start-yarn.sh :

starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-xx.out
localhost: ssh: connect to host localhost port 22: connection refused


when I ran jps to check, only Jps and ResourceManager were shown -
NodeManager, NameNode and DataNode were not shown.


Re: hadoop installation error: localhost: ssh: connect to host localhost port 22: connection refused

2014-08-28 Thread Ritesh Kumar Singh
try 'ssh localhost' and show the output


On Thu, Aug 28, 2014 at 7:55 PM, Li Chen ahli1...@gmail.com wrote:

 Can anyone please help me with this installation error?

 After I type start-yarn.sh :

 starting yarn daemons
 starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-xx.out
 localhost: ssh: connect to host localhost port 22: connection refused


 when I ran jps to check, only Jps and ResourceManager were shown -
 NodeManager, NameNode and DataNode were not shown.



Re: Need some tutorials for Mapreduce written in Python

2014-08-28 Thread Amar Singh
Thank you to everyone who responded to this thread. I got couple of good
moves and got some good online courses to explore from to get some
fundamental understanding of the things.

Thanks
Amar


On Thu, Aug 28, 2014 at 10:15 AM, Sriram Balachander 
sriram.balachan...@gmail.com wrote:

 Hadoop The Definitive Guide, Hadoop in action are good books and the
 course in edureka is also good.

 Regards
 Sriram


 On Wed, Aug 27, 2014 at 9:25 PM, thejas prasad thejch...@gmail.com
 wrote:

 Are any books for this as well?



 On Wed, Aug 27, 2014 at 8:30 PM, Marco Shaw marco.s...@gmail.com wrote:

 You might want to consider the Hadoop course on udacity.com.  I think
 it provides a decent foundation to Hadoop/MapReduce with a focus on Python
 (using the streaming API like Sebastiano mentions).

 Marco


 On Wed, Aug 27, 2014 at 3:13 PM, Amar Singh amarsingh...@gmail.com
 wrote:

 Hi Users,
 I am new to big data world and was in process of reading some material
 of writing mapreduce using Python.

 Any links or pointers in that direction will be really helpful.







ApplicationMaster link on cluster web page does not work

2014-08-28 Thread Margusja

Hi

configuration

!-- Site specific YARN configuration properties --
property
  nameyarn.app.mapreduce.am.staging-dir/name
  value/user/value
/property

property
nameyarn.nodemanager.aux-services/name
valuemapreduce_shuffle/value
/property

property
nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name
valueorg.apache.hadoop.mapred.ShuffleHandler/value
/property

property
nameyarn.application.classpath/name
value
/etc/hadoop/conf,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,/home/hduser/mahout-1.0-snapshot/math/target/*
/value
/property

property
nameyarn.resourcemanager.scheduler.address/name
value0.0.0.0:8030/value
/property

property
nameyarn.web-proxy.address/name
valuevm38.dbweb.ee:8089/value
/property

/configuration

when i click job's ApplicationMaster link 
(http://vm38.dbweb.ee:8089/proxy/application_1409246753441_0001/) on 
claster all applications list I will get:



   HTTP ERROR 404

Problem accessing /proxy/application_1409246753441_0001/mapreduce. Reason:

NOT_FOUND


/Powered by Jetty://

/I use yarn.web-proxy because without yarn.web-proxy when I click 
ApplicationMaster link I will have loads of tcp connections with 
established statuses.

In yarn-proxy log there is nothing interesting

--
Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)



Re: ApplicationMaster link on cluster web page does not work

2014-08-28 Thread Margusja

More information

after I started resourcemanager
[root@vm38 ~]# /etc/init.d/hadoop-yarn-resourcemanager start
Starting Hadoop resourcemanager:   [  OK ]

and I open cluster web interface there is some tcp connections to 8088:

[root@vm38 ~]# netstat -np | grep 8088
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:61120ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:64412ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:50139ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:64407ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:58817ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:52250ESTABLISHED 15723/java


this is ok

Now I started new map reduce job and got tracking url 
http://server:8088/proxy/application_1409250808355_0001/


now there are loads of connections and tracking url does not response:
tcp0  0 :::90.190.106.33:46910 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:35984 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:37559  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:44154 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:45417 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:57294 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:47949  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:55330 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp  432  0 :::90.190.106.33:8088 
:::90.190.106.33:45467  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:58405 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:55580 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:51992  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:44578 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:38686  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:39242 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:55916 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:48064 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:35148 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:42638 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:50836 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:44789  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:55051 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:39207  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:53781 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:54180 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:56344 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:52707 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:52274 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:45417  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:46627  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:44583  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:40928 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:47014 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:60939 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:52274  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:38391 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:47321  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:36186  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:39160 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:54352 

Re: ApplicationMaster link on cluster web page does not work

2014-08-28 Thread Margusja
Moved resourcemanager to another server and it works. I guess I have 
some network miss routing there :)


Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)

On 28/08/14 21:39, Margusja wrote:

More information

after I started resourcemanager
[root@vm38 ~]# /etc/init.d/hadoop-yarn-resourcemanager start
Starting Hadoop resourcemanager:   [  OK ]

and I open cluster web interface there is some tcp connections to 8088:

[root@vm38 ~]# netstat -np | grep 8088
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:61120ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:64412ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:50139ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:64407ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:58817ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::84.50.21.39:52250ESTABLISHED 15723/java


this is ok

Now I started new map reduce job and got tracking url 
http://server:8088/proxy/application_1409250808355_0001/


now there are loads of connections and tracking url does not response:
tcp0  0 :::90.190.106.33:46910 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:35984 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:37559  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:44154 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:45417 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:57294 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:47949  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:55330 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp  432  0 :::90.190.106.33:8088 
:::90.190.106.33:45467  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:58405 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:55580 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:51992  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:44578 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:38686  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:39242 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:55916 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:48064 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:35148 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:42638 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:50836 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:44789  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:55051 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:39207  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:53781 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:54180 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:56344 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:52707 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:52274 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:45417  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:46627  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:44583  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:40928 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:47014 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:60939 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:8088 
:::90.190.106.33:52274  ESTABLISHED 15723/java
tcp0  0 :::90.190.106.33:38391 
:::90.190.106.33:8088   ESTABLISHED 15723/java
tcp0  0 

Job is reported as complete on history server while on console it shows as only half way thru

2014-08-28 Thread S.L
Hi All,

I am running a MRV1 job on Hadoop YARN 2.3.0 cluster , the problem is when
I submit this job YARN created multiple applications for that submitted job
, and the last application that is running in YARN is marked as complete
even as on console its reported as only 58% complete . I have confirmed
that its also not printing the log statements that its supposed to print
when the job is actually complete .

Please see the output from the job submission console below. It just stops
at 58% and job history server and YARN cluster UI reports that this job has
already succeeded.

4/08/28 08:36:19 INFO mapreduce.Job:  map 54% reduce 0%
14/08/28 08:44:13 INFO mapreduce.Job:  map 55% reduce 0%
14/08/28 08:52:16 INFO mapreduce.Job:  map 56% reduce 0%
14/08/28 08:59:22 INFO mapreduce.Job:  map 57% reduce 0%
14/08/28 09:07:33 INFO mapreduce.Job:  map 58% reduce 0%

Thanks.


org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi,

I use Hadoop 2.4.1, I got org.apache.hadoop.io.compress.SnappyCodec not found” 
error:

hadoop checknative
14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize 
native-bzip2 library system-native, will use pure-Java version
14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded  initialized 
native-zlib library
Native library checking:
hadoop: true 
/mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so
zlib:   true /lib64/libz.so.1
snappy: true 
/mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1
lz4:true revision:99
bzip2:  false

(smoke test is ok)
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar  teragen 
30 /tmp/teragenout
14/08/29 07:40:41 INFO mapreduce.Job: Running job: job_1409253811850_0002
14/08/29 07:40:53 INFO mapreduce.Job: Job job_1409253811850_0002 running in 
uber mode : false
14/08/29 07:40:53 INFO mapreduce.Job:  map 0% reduce 0%
14/08/29 07:41:00 INFO mapreduce.Job:  map 50% reduce 0%
14/08/29 07:41:01 INFO mapreduce.Job:  map 100% reduce 0%
14/08/29 07:41:02 INFO mapreduce.Job: Job job_1409253811850_0002 completed 
successfully
14/08/29 07:41:02 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=197312
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=167
HDFS: Number of bytes written=3000
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters 
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=11925
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11925
Total vcore-seconds taken by all map tasks=11925
Total megabyte-seconds taken by all map tasks=109900800
Map-Reduce Framework
Map input records=30
Map output records=30
Input split bytes=167
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=22
CPU time spent (ms)=1910
Physical memory (bytes) snapshot=357318656
Virtual memory (bytes) snapshot=1691631616
Total committed heap usage (bytes)=401997824
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=644086318705578
File Input Format Counters 
Bytes Read=0
File Output Format Counters 
Bytes Written=3000
14/08/29 07:41:03 INFO terasort.TeraSort: starting
14/08/29 07:41:03 INFO input.FileInputFormat: Total input paths to process : 2


However I got org.apache.hadoop.io.compress.SnappyCodec not found” when 
running spark smoke test program: 

scala inFILE.first()
java.lang.RuntimeException: Error in configuring object 
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.RDD.take(RDD.scala:983)
at org.apache.spark.rdd.RDD.first(RDD.scala:1015)
at $iwC$$iwC$$iwC$$iwC.init(console:15)
at $iwC$$iwC$$iwC.init(console:20)
at $iwC$$iwC.init(console:22)
at $iwC.init(console:24)
at init(console:26)
at .init(console:30)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 

Re: Local file system to access hdfs blocks

2014-08-28 Thread Demai Ni
Stanley and all,

thanks. I will write a client application to explore this path. A quick
question again.
Using the fsck command, I can retrieve all the necessary info
$ hadoop fsck /tmp/list2.txt -files -blocks -racks
.
 *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8 repl=2
[/default/10.122.195.198:50010, /default/10.122.195.196:50010]

However, using getFileBlockLocations(), I can't get the block name/id info,
such as
*BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025*seem the
BlockLocation don't provide the public info here.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html

is there another entry point? somethinig fsck is using? thanks

Demai




On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi s...@pivotal.io wrote:

 As far as I know, there's no combination of hadoop API can do that.
 You can easily get the location of the block (on which DN), but there's no
 way to get the local address of that block file.



 On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni nid...@gmail.com wrote:

 Yehia,

 No problem at all. I really appreciate your willingness to help. Yeah.
 now I am able to get such information through two steps, and the first step
 will be either hadoop fsck or getFileBlockLocations(). and then search
 the local filesystem, my cluster is using the default from CDH, which is
 /dfs/dn

 I would like to it programmatically, so wondering whether someone already
 done it? or maybe better a hadoop API call already implemented for this
 exact purpose

 Demai


 On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater y.z.elsha...@gmail.com
 wrote:

 Hi Demai,

 Sorry, I missed that you are already tried this out. I think you can
 construct the block location on the local file system if you have the block
 pool id and the block id. If you are using cloudera distribution, the
 default location is under /dfs/dn ( the value of dfs.data.dir,
 dfs.datanode.data.dir configuration keys).

 Thanks
 Yehia


 On 27 August 2014 21:20, Yehia Elshater y.z.elsha...@gmail.com wrote:

 Hi Demai,

 You can use fsck utility like the following:

 hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks

 This will display all the information you need about the blocks of your
 file.

 Hope it helps.
 Yehia


 On 27 August 2014 20:18, Demai Ni nid...@gmail.com wrote:

 Hi, Stanley,

 Many thanks. Your method works. For now, I can have two steps approach:
 1) getFileBlockLocations to grab hdfs BlockLocation[]
 2) use local file system call(like find command) to match the block to
 files on local file system .

 Maybe there is an existing Hadoop API to return such info in already?

 Demai on the run

 On Aug 26, 2014, at 9:14 PM, Stanley Shi s...@pivotal.io wrote:

 I am not sure this is what you want but you can try this shell command:

 find [DATANODE_DIR] -name [blockname]


 On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni nid...@gmail.com wrote:

 Hi, folks,

 New in this area. Hopefully to get a couple pointers.

 I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)

 I am wondering whether there is a interface to get each hdfs block
 information in the term of local file system.

 For example, I can use Hadoop fsck /tmp/test.txt -files -blocks
 -racks to get blockID and its replica on the nodes, such as: repl =3[
 /rack/hdfs01, /rack/hdfs02...]

  With such info, is there a way to
 1) login to hfds01, and read the block directly at local file system
 level?


 Thanks

 Demai on the run




 --
 Regards,
 *Stanley Shi,*







 --
 Regards,
 *Stanley Shi,*




Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread Tsuyoshi OZAWA
Hi,

It looks a problem of class path at spark side.

Thanks,
- Tsuyoshi

On Fri, Aug 29, 2014 at 8:49 AM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
 Hi,

 I use Hadoop 2.4.1, I got org.apache.hadoop.io.compress.SnappyCodec not
 found” error:

 hadoop checknative
 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize
 native-bzip2 library system-native, will use pure-Java version
 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded  initialized
 native-zlib library
 Native library checking:
 hadoop: true
 /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so
 zlib:   true /lib64/libz.so.1
 snappy: true
 /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1
 lz4:true revision:99
 bzip2:  false

 (smoke test is ok)
 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar
 teragen 30 /tmp/teragenout
 14/08/29 07:40:41 INFO mapreduce.Job: Running job: job_1409253811850_0002
 14/08/29 07:40:53 INFO mapreduce.Job: Job job_1409253811850_0002 running in
 uber mode : false
 14/08/29 07:40:53 INFO mapreduce.Job:  map 0% reduce 0%
 14/08/29 07:41:00 INFO mapreduce.Job:  map 50% reduce 0%
 14/08/29 07:41:01 INFO mapreduce.Job:  map 100% reduce 0%
 14/08/29 07:41:02 INFO mapreduce.Job: Job job_1409253811850_0002 completed
 successfully
 14/08/29 07:41:02 INFO mapreduce.Job: Counters: 31
 File System Counters
 FILE: Number of bytes read=0
 FILE: Number of bytes written=197312
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=167
 HDFS: Number of bytes written=3000
 HDFS: Number of read operations=8
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=4
 Job Counters
 Launched map tasks=2
 Other local map tasks=2
 Total time spent by all maps in occupied slots (ms)=11925
 Total time spent by all reduces in occupied slots (ms)=0
 Total time spent by all map tasks (ms)=11925
 Total vcore-seconds taken by all map tasks=11925
 Total megabyte-seconds taken by all map tasks=109900800
 Map-Reduce Framework
 Map input records=30
 Map output records=30
 Input split bytes=167
 Spilled Records=0
 Failed Shuffles=0
 Merged Map outputs=0
 GC time elapsed (ms)=22
 CPU time spent (ms)=1910
 Physical memory (bytes) snapshot=357318656
 Virtual memory (bytes) snapshot=1691631616
 Total committed heap usage (bytes)=401997824
 org.apache.hadoop.examples.terasort.TeraGen$Counters
 CHECKSUM=644086318705578
 File Input Format Counters
 Bytes Read=0
 File Output Format Counters
 Bytes Written=3000
 14/08/29 07:41:03 INFO terasort.TeraSort: starting
 14/08/29 07:41:03 INFO input.FileInputFormat: Total input paths to process :
 2


 However I got org.apache.hadoop.io.compress.SnappyCodec not found” when
 running spark smoke test program:

 scala inFILE.first()
 java.lang.RuntimeException: Error in configuring object
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
 at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
 at org.apache.spark.rdd.RDD.take(RDD.scala:983)
 at org.apache.spark.rdd.RDD.first(RDD.scala:1015)
 at $iwC$$iwC$$iwC$$iwC.init(console:15)
 at $iwC$$iwC$$iwC.init(console:20)
 at $iwC$$iwC.init(console:22)
 at $iwC.init(console:24)
 at init(console:26)
 at .init(console:30)
 at .clinit(console)
 at .init(console:7)
 at .clinit(console)
 at $print(console)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
 at
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
 at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
 at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
 at
 

How to retrieve cached data in datanode? (Centralized cache management)

2014-08-28 Thread 남윤민
 Hello, I have a question about using the cached data in memory via centralized 
cache management. I cached the data what I want to use through the CLI (hdfs 
cacheadmin -addDirectives ...).Then, when I write my mapreduce application, how 
can I read the cached data in memory? Here is the source code from my mapreduce 
application.System.out.println(Ready for loading customer table from 
Centralized Cache in DataNode);   System.out.println(Connecting HDFS... at  
+ hdfsURI.toString());   DFSClient dfs = new DFSClient(hdfsURI, new 
Configuration());   CacheDirectiveInfo info =  new 
CacheDirectiveInfo.Builder().setPath(new Path(path in HDFS for cached 
data)).setPool(cache).build();   CacheDirectiveEntry cachedFile = 
dfs.listCacheDirectives(info).next();   System.out.println(We got cachedFile! 
ID:  +cachedFile.getInfo().getId() + , Path:  + 
cachedFile.getInfo().getPath() + , CachedPool:  + 
cachedFile.getInfo().getPool());  System.out.println(Open DFSInputStream t
 o read cachedFile to ByteBuffer);   DFSInputStream in = 
dfs.open(cachedFile.getInfo().getPath().toString());   ElasticByteBufferPool 
bufPool = new ElasticByteBufferPool();   ByteBuffer buf = 
ByteBuffer.allocate(1);   System.out.println(Generating Off-Heap 
ByteBuffer! size:  + buf.capacity());   in.read(buf);   buf.flip(); // Flip: 
ready for reading data after writing data into buffer   
System.out.println(Zero-Copying cached file into buffer!);  
Is it right source code for using the centralized cache management feature?
Thanks


// Yoonmin Nam