Re: Local file system to access hdfs blocks

2014-08-29 Thread Demai Ni
Stanley, 

Thanks. 

Btw, I found this jira hdfs-2246, which probably match what I am looking for.  

Demai on the run

On Aug 28, 2014, at 11:34 PM, Stanley Shi  wrote:

> BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information
> blk_1073742025 is the block name;
> 
> these names are "private" to teh HDFS system and user should not use them, 
> right?
> But if you really want ot know this, you can check the fsck code to see 
> whether they are available;
> 
> 
> On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni  wrote:
>> Stanley and all,
>> 
>> thanks. I will write a client application to explore this path. A quick 
>> question again. 
>> Using the fsck command, I can retrieve all the necessary info
>> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
>> .
>>  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 len=8 repl=2
>> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>> 
>> However, using getFileBlockLocations(), I can't get the block name/id info, 
>> such as  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025
>> seem the BlockLocation don't provide the public info here. 
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>> 
>> is there another entry point? somethinig fsck is using? thanks 
>> 
>> Demai
>> 
>> 
>> 
>> 
>> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi  wrote:
>>> As far as I know, there's no combination of hadoop API can do that.
>>> You can easily get the location of the block (on which DN), but there's no 
>>> way to get the local address of that block file.
>>> 
>>> 
>>> 
>>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni  wrote:
 Yehia,
 
 No problem at all. I really appreciate your willingness to help. Yeah. now 
 I am able to get such information through two steps, and the first step 
 will be either hadoop fsck or getFileBlockLocations(). and then search the 
 local filesystem, my cluster is using the default from CDH, which is 
 /dfs/dn
 
 I would like to it programmatically, so wondering whether someone already 
 done it? or maybe better a hadoop API call already implemented for this 
 exact purpose
 
 Demai
 
 
 On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater  
 wrote:
> Hi Demai,
> 
> Sorry, I missed that you are already tried this out. I think you can 
> construct the block location on the local file system if you have the 
> block pool id and the block id. If you are using cloudera distribution, 
> the default location is under /dfs/dn ( the value of dfs.data.dir, 
> dfs.datanode.data.dir configuration keys).
> 
> Thanks
> Yehia 
> 
> 
> On 27 August 2014 21:20, Yehia Elshater  wrote:
>> Hi Demai,
>> 
>> You can use fsck utility like the following:
>> 
>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>> 
>> This will display all the information you need about the blocks of your 
>> file.
>> 
>> Hope it helps.
>> Yehia
>> 
>> 
>> On 27 August 2014 20:18, Demai Ni  wrote:
>>> Hi, Stanley,
>>> 
>>> Many thanks. Your method works. For now, I can have two steps approach:
>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>> 2) use local file system call(like find command) to match the block to 
>>> files on local file system .
>>> 
>>> Maybe there is an existing Hadoop API to return such info in already?
>>> 
>>> Demai on the run
>>> 
>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi  wrote:
>>> 
 I am not sure this is what you want but you can try this shell command:
 
 find [DATANODE_DIR] -name [blockname]
 
 
 On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni  wrote:
> Hi, folks,
> 
> New in this area. Hopefully to get a couple pointers.
> 
> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
> 
> I am wondering whether there is a interface to get each hdfs block 
> information in the term of local file system.
> 
> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks 
> -racks" to get blockID and its replica on the nodes, such as: repl 
> =3[ /rack/hdfs01, /rack/hdfs02...]
> 
>  With such info, is there a way to
> 1) login to hfds01, and read the block directly at local file system 
> level?
> 
> 
> Thanks
> 
> Demai on the run
 
 
 
 -- 
 Regards,
 Stanley Shi,
 
>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Stanley Shi,
>>> 
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
> 


Hadoop 2.5.0 unit tests failures

2014-08-29 Thread Rajat Jain
Hi,

I wanted to know if all the unit tests pass in the hadoop-common project
across various releases. I have never been able to get a clean run on my
machine (Centos 6.5 / 4GB RAM / tried both Java 6 and Java 7). I have also
attached the document which has the failures that I got while running the
tests.

I ran "mvn clean package install -DskipTests" to compile, and thereafter,
ran "mvn test" from individual subprojects.

In my company, we have forked Apache Hadoop 2.5.0 and we are planning to
deploy a nightly unit test run to make sure we don't introduce any
regressions. Is there a way to get a clean unit-test run, or should I
disable these tests from our suite? I also read somewhere else that there
are a few flaky tests as well.

Any help is appreciated.

https://docs.google.com/a/qubole.com/spreadsheets/d/1bKCclEA0u9VUZvykgaRj_gBqY4xMxTBIGPJml04TXtE/edit#gid=1215903400

Thanks,
Rajat


Re: Hadoop 2.5.0 - HDFS browser-based file view

2014-08-29 Thread Brian C. Huffman
For normal use, yes.  For development, it's useful to start small and be 
able to quickly see the output.


It's certainly not a major issue.  I just didn't know if there was a 
config option to change the behavior.


Thanks,
Brian

On 08/28/2014 02:30 AM, Stanley Shi wrote:
Normally files in HDFS are intended to be quite big, it's not very 
easy to be shown in the browser.



On Fri, Aug 22, 2014 at 10:56 PM, Brian C. Huffman 
mailto:bhuff...@etinternational.com>> 
wrote:


All,

I noticed that that on Hadoop 2.5.0, when browsing the HDFS
filesystem on port 50070, you can't view a file in the browser.
Clicking a file gives a little popup with metadata and a download
link. Can HDFS be configured to show plaintext file contents in
the browser?

Thanks,
Brian




--
Regards,
*Stanley Shi,*





FW: Problem in using Hadoop Eclipse Plugin

2014-08-29 Thread YIMEN YIMGA Gael


From: YIMEN YIMGA Gael ItecCsySat
Sent: Friday 29 August 2014 12:12
To: 'hdfs-u...@hadoop.apache.org'
Subject: Problem in using Hadoop Eclipse Plugin

Hello Dears,

I did the same, but I'm facing the same error  when launching the program

[cid:image001.png@01CFC382.72E72730]

Here is the error in Eclipse

[cid:image002.png@01CFC382.72E72730]

Please, could you advise in this case ?

Regards

GYY
*
This message and any attachments (the "message") are confidential, intended 
solely for the addressee(s), and may contain legally privileged information.
Any unauthorised use or dissemination is prohibited. E-mails are susceptible to 
alteration.   
Neither SOCIETE GENERALE nor any of its subsidiaries or affiliates shall be 
liable for the message if altered, changed or
falsified.
Please visit http://swapdisclosure.sgcib.com for important information with 
respect to derivative products.
  
Ce message et toutes les pieces jointes (ci-apres le "message") sont 
confidentiels et susceptibles de contenir des informations couvertes 
par le secret professionnel. 
Ce message est etabli a l'intention exclusive de ses destinataires. Toute 
utilisation ou diffusion non autorisee est interdite.
Tout message electronique est susceptible d'alteration. 
La SOCIETE GENERALE et ses filiales declinent toute responsabilite au titre de 
ce message s'il a ete altere, deforme ou falsifie.
Veuillez consulter le site http://swapdisclosure.sgcib.com afin de recueillir 
d'importantes informations sur les produits derives.
*


[HDFS] DFSClient does not closing a closed socket resulting in thousand of CLOSE_WAIT sockets with HDP 2.1/HBase 0.98.0/Hadoop/2.4.0

2014-08-29 Thread Steven Xu
Hello Hadoopers,

When I run HDP 2.1/HBase 0.98.0/Hadoop/2.4.0, I always got the fatal
problem: DFSClient does not closing a closed socket resulting in thousand of
CLOSE_WAIT sockets. Have you guys got same issue, if that please share to
me? Thanks a lot. I also create a issue HDFS-6973 for this.

 

HBase as HDFS Client dose not close a dead connection with the datanode.
This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not
connect to the datanode because too many mapped sockets from one host to
another on the same port:50010. 
After I restart all RSs, the count of CLOSE_WAIT will increase always.
$ netstat -an|grep CLOSE_WAIT|wc -l
2545
netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
2545
ps -ef|grep 6569
hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java
-Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
-XX:+UseConcMarkSweepGC
I aslo have reviewed these issues:
HDFS-5697  
HDFS-5671  
HDFS-1836  
  HBASE-9393
I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been
added.
But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue.
Please check. Thanks a lot.
These codes have been added into
BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my
problem,


BlockReaderFactory.java

 

// Some comments here

  private BlockReader getRemoteBlockReaderFromTcp() throws IOException {

if (LOG.isTraceEnabled()) {

  LOG.trace(this + ": trying to create a remote block reader from a " +

  "TCP socket");

}

BlockReader blockReader = null;

while (true) {

  BlockReaderPeer curPeer = null;

  Peer peer = null;

  try {

curPeer = nextTcpPeer();

if (curPeer == null) break;

if (curPeer.fromCache) remainingCacheTries--;

peer = curPeer.peer;

blockReader = getRemoteBlockReader(peer);

return blockReader;

  } catch (IOException ioe) {

if (isSecurityException(ioe)) {

  if (LOG.isTraceEnabled()) {

LOG.trace(this + ": got security exception while constructing "
+

"a remote block reader from " + peer, ioe);

  }

  throw ioe;

}

if ((curPeer != null) && curPeer.fromCache) {

  // Handle an I/O error we got when using a cached peer.  These are

  // considered less serious, because the underlying socket may be

  // stale.

  if (LOG.isDebugEnabled()) {

LOG.debug("Closed potentially stale remote peer " + peer, ioe);

  }

} else {

  // Handle an I/O error we got when using a newly created peer.

  LOG.warn("I/O error constructing remote block reader.", ioe);

  throw ioe;

}

  } finally {

if (blockReader == null) {

  IOUtils.cleanup(LOG, peer);

}

  }

}

return null;

  }

 

---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please 
immediately notify the sender by return e-mail, and delete the original message 
and all copies from 
your system. Thank you. 
---


Reflect changes without restart data node

2014-08-29 Thread hadoop hive
Hi folks,

Is there anyway that Hadoop re-read config file like core-site.XML without
restarting datanode.

Ex like we dor slave or exclude file by running
Hadoop dfsadmin - refreshNodes

Thanks
Vikas Srivastava