Re: Hadoop and Hbase setup

2014-02-13 Thread lars hofhansl
Both Hadoop 1 and Hadoop 2 work. If you start from scratch you should probably 
start with Hadoop 2.
Note that if you want to use Hadoop 2.2.x you need to change the protobuf 
dependency in HBase's pom.xml to 2.5.
(there's a certain irony here that the protocol/library we use to get version 
compatibility causes us version compatibility issues)


Also note that prebuilt tar balls are all against Hadoop 1. For Hadoop 2 you 
need to build HBase from source (which is not hard).

-- Lars




 From: Vimal Jain 
To: "u...@hbase.apache.org" ; "user@hadoop.apache.org" 
 
Sent: Thursday, February 13, 2014 11:29 PM
Subject: Hadoop and Hbase setup
 

Hi,
I am planning to install Hadoop and Hbase in a 2 node cluster.
I have Chosen 0.94.16 for Hbase ( the current stable version ).
I am confused about which hadoop version to choose from.
I see on Hadoop's download page , there are 2 stable series, one is 1.X and
other is 2.X series.
Which one should i use ?
What are major differences between these two ?

-- 
Thanks and Regards,
Vimal Jain

Copying data from one Hbase cluster to Another Hbase cluster

2014-02-13 Thread Vimal Jain
Hi,
I have Hbase and Hadoop setup in pseudo distributed mode in production.
Now i am planning to move from pseudo distributed mode to fully distributed
mode ( 2 node cluster).
My existing Hbase and Hadoop version are 1.1.2  and  0.94.7.
And i am planning to have full distributed mode with Hbase version 0.94.16
and Hadoop version ( either 1.X or 2.X , not yet decided ).

What are different ways to copy data from existing setup ( pseudo
distributed mode ) to this new setup ( 2 node fully distributed mode).

Please help.

-- 
Thanks and Regards,
Vimal Jain


Hadoop and Hbase setup

2014-02-13 Thread Vimal Jain
Hi,
I am planning to install Hadoop and Hbase in a 2 node cluster.
I have Chosen 0.94.16 for Hbase ( the current stable version ).
I am confused about which hadoop version to choose from.
I see on Hadoop's download page , there are 2 stable series, one is 1.X and
other is 2.X series.
Which one should i use ?
What are major differences between these two ?

-- 
Thanks and Regards,
Vimal Jain


How to submit the patch MAPREDUCE-4490.patch which works for branch-1.2, not trunk?

2014-02-13 Thread sam liu
Hi Experts,

I have been working on the JIRA
https://issues.apache.org/jira/browse/MAPREDUCE-4490 and attached
MAPREDUCE-4490.patch which could fix this jira. I would like to contribute
my patch to community, but encountered some issues.

MAPREDUCE-4490 is an issue on Hadoop-1.x versions, and my patch based on
the  latest code of origin/branch-1.2. However, current trunk bases on Yarn
and does not has such issue any more. So my patch could not be applied on
current trunk code, and it's actually no need to generate a similar patch
on trunk at all.

How to submit the patch MAPREDUCE-4490.patch only to origin/branch-1.2, not
trunk? Is it allowed by Apache Hadoop?

Thanks!


How to ascertain why LinuxContainer dies?

2014-02-13 Thread Jay Vyas
I have a linux container that dies.  The nodemanager logs only say:

WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Exception from container-launch :
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:202)
at org.apache.hadoop.util.Shell.run(Shell.java:129)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

where can i find the root cause of the non-zero exit code ?

-- 
Jay Vyas
http://jayunit100.blogspot.com


issure about write append to a file ,and close ,reopen the same file

2014-02-13 Thread ch huang
hi,maillist:
   i use scribe to receive data from app ,write to hadoop hdfs,when
the system in high concurrency connect
it will cause hdfs error like the following ,the incoming connect will be
blocked ,and the tomcat will die,

in dir user/hive/warehouse/dsp.db/request ,the file data_0 will be
rotate each hour ,but the scribe ( we modified the scribe code) will switch
the same file when rotate happen ,so  data_0 will be close ,and reopen .
and when the load is high ,i can observe the corrupt replica of
data_0,how can i handle with it? thanks

[Thu Feb 13 23:59:59 2014] "[hdfs] disconnected fileSys for
/user/hive/warehouse/dsp.db/request"
[Thu Feb 13 23:59:59 2014] "[hdfs] closing
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0"
[Thu Feb 13 23:59:59 2014] "[hdfs] disconnecting fileSys for
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0"
[Thu Feb 13 23:59:59 2014] "[hdfs] disconnected fileSys for
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0"
[Thu Feb 13 23:59:59 2014] "[hdfs] Connecting to HDFS for
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0"
[Thu Feb 13 23:59:59 2014] "[hdfs] opened for append
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0"
[Thu Feb 13 23:59:59 2014] "[dsp_request] Opened file
 for writing"
[Thu Feb 13 23:59:59 2014] "[dsp_request] 23:59 rotating file
<2014-02-13/data> old size <10027577955> max size <100>"
[Thu Feb 13 23:59:59 2014] "[hdfs] Connecting to HDFS for
/user/hive/warehouse/dsp.db/request"
[Thu Feb 13 23:59:59 2014] "[hdfs] disconnecting fileSys for
/user/hive/warehouse/dsp.db/request"
14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink as
192.168.11.13:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494)
14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block
BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in
pipeline 192.168.11.12:50010, 192.168.11.13:50010, 192.168.11.14:50010,
192.168.11.10:50010, 192.168.11.15:50010: bad datanode 192.168.11.13:50010
14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink as
192.168.11.10:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494)
14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block
BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in
pipeline 192.168.11.12:50010, 192.168.11.14:50010, 192.168.11.10:50010,
192.168.11.15:50010: bad datanode 192.168.11.10:50010
14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink as
192.168.11.15:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494)
14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block
BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in
pipeline 192.168.11.12:50010, 192.168.11.14:50010, 192.168.11.15:50010: bad
datanode 192.168.11.15:50010


/user/hive/warehouse/dsp.db/request/2014-02-13/data_0:
blk_433572108425800355_3411509 (replicas: l: 1 d: 0 c: 4 e: 0)
192.168.11.12:50010 :  192.168.11.13:50010(corrupt) :
192.168.11.14:50010(corrupt)
:  192.168.11.10:50010(corrupt) :  192.168.11.15:50010(corrupt) :


Re: job submission between 2 YARN clusters

2014-02-13 Thread Anfernee Xu
Hi Zhijie,

I agree, what I'm doing in the standalone app is that the app loads the
first cluster Configuration(mapred-site.xml, yarn-site.xml) as its default
configuration, and then submit MR job with this configuration to the first
cluster, and after the job is finished, I will submit the second job to the
second cluster with almost same Configuration exception I changed the
property: yarn.resourcemanager.address pointing to the second cluster's RM.
My guess the job.xml of the second job holds all property values of the
first cluster(such as yarn.resourcemanager.scheduler.address) and will
override these properties specified in the second cluster(yarn-site.xml for
example), therefore it will talk to the wrong RM when NM is launching the
container.

Please comment.

BTW, I just tweak the standalone app so that it will load the second
cluster's configuration(yarn-site.xml) before submit the second job, it
seems working.

Thanks


On Thu, Feb 13, 2014 at 4:28 PM, Zhijie Shen  wrote:

> Hi Anfernee,
>
> It sounds most likely that config somehow corrupts. So you have two sets
> of config to start two YARN cluster separately, don't you? If you provide
> more detail about how you config the two clusters, it's easy for the
> community to understand your problem.
>
> - Zhijie
>
>
> On Thu, Feb 13, 2014 at 11:34 AM, Anfernee Xu wrote:
>
>> I'm at Yarn 2.2.0 release, I configured 2 single-node clusters on my
>> laptop(just for POC and all port conflicts are resolved, and I can see NM
>> and RM is up, webUI shows everything is fine) and I also have a standalone
>> java application. The java application is a kind of job client, it will
>> submit job1 to Cluser #1, once the job is finished, it will submit another
>> job2 to Cluster #2.
>>
>> What I'm seeing is the job1 is doing fine, but job2 failed, I looked
>> source code, and found the NM in cluser2 was talking to cluser1's RM via
>> wrong yarn.resourcemanager.scheduler.address. How that happens? I just want
>> to make sure there's no such issue in real deployment.
>>
>> --
>> --Anfernee
>>
>
>
>
> --
> Zhijie Shen
> Hortonworks Inc.
> http://hortonworks.com/
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
--Anfernee


Re: job submission between 2 YARN clusters

2014-02-13 Thread Zhijie Shen
Hi Anfernee,

It sounds most likely that config somehow corrupts. So you have two sets of
config to start two YARN cluster separately, don't you? If you provide more
detail about how you config the two clusters, it's easy for the community
to understand your problem.

- Zhijie


On Thu, Feb 13, 2014 at 11:34 AM, Anfernee Xu  wrote:

> I'm at Yarn 2.2.0 release, I configured 2 single-node clusters on my
> laptop(just for POC and all port conflicts are resolved, and I can see NM
> and RM is up, webUI shows everything is fine) and I also have a standalone
> java application. The java application is a kind of job client, it will
> submit job1 to Cluser #1, once the job is finished, it will submit another
> job2 to Cluster #2.
>
> What I'm seeing is the job1 is doing fine, but job2 failed, I looked
> source code, and found the NM in cluser2 was talking to cluser1's RM via
> wrong yarn.resourcemanager.scheduler.address. How that happens? I just want
> to make sure there's no such issue in real deployment.
>
> --
> --Anfernee
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: umount bad disk

2014-02-13 Thread Siddharth Tiwari
Try doing unmount -l

Sent from my iPhone

> On Feb 13, 2014, at 11:10 AM, "Arpit Agarwal"  
> wrote:
> 
> bcc'ed hadoop-user
> 
> Lei, perhaps hbase-user can help.
> 
> -- Forwarded message --
> From: lei liu 
> Date: Thu, Feb 13, 2014 at 1:04 AM
> Subject: umount bad disk
> To: user@hadoop.apache.org
> 
> 
> I use HBase0.96 and CDH4.3.1.
> 
> I use Short-Circuit Local Read:
> 
>   dfs.client.read.shortcircuit
>   true
> 
> 
>   dfs.domain.socket.path
>   /home/hadoop/cdh4-dn-socket/dn_socket
> 
> 
> When one disk is bad, because the RegionServer open some file on the disk, so 
> I don't run "umount", example:
> sudo umount -f /disk10
> umount2: Device or resource busy
> 
> 
> umount: /disk10: device is busy
> umount2: Device or resource busy
> umount: /disk10: device is busy 
> 
> I must stop RegionServer in order to run umount command.  
> 
> 
> How can don't stop RegionServer and delete the bad disk.
> 
> Thanks,
> 
> LiuLei
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader of 
> this message is not the intended recipient, you are hereby notified that any 
> printing, copying, dissemination, distribution, disclosure or forwarding of 
> this communication is strictly prohibited. If you have received this 
> communication in error, please contact the sender immediately and delete it 
> from your system. Thank You.


job submission between 2 YARN clusters

2014-02-13 Thread Anfernee Xu
I'm at Yarn 2.2.0 release, I configured 2 single-node clusters on my
laptop(just for POC and all port conflicts are resolved, and I can see NM
and RM is up, webUI shows everything is fine) and I also have a standalone
java application. The java application is a kind of job client, it will
submit job1 to Cluser #1, once the job is finished, it will submit another
job2 to Cluster #2.

What I'm seeing is the job1 is doing fine, but job2 failed, I looked source
code, and found the NM in cluser2 was talking to cluser1's RM via wrong
yarn.resourcemanager.scheduler.address. How that happens? I just want to
make sure there's no such issue in real deployment.

-- 
--Anfernee


The Activities of Apache Hadoop Community

2014-02-13 Thread Akira AJISAKA
Hi all,

We collected and analyzed JIRA tickets to investigate
the activities of Apache Hadoop Community in 2013.

http://ajisakaa.blogspot.com/2014/02/the-activities-of-apache-hadoop.html

We counted the number of the organizations, the lines
of code, and the number of the issues. As a result, we
confirmed all of them are increasing and Hadoop community
is getting more active.
We appreciate continuous contributions of developers
and we hope the activities will expand also in 2014.

Thanks,
Akira


Hadoop Native Libs

2014-02-13 Thread Gobinathan SP (HCL Financial Services)
Hi all,

My team has set up Hadoop 2.2.0. Having issues in loading the hadoop native 
libraraies
We've  got a pre-built version of libs. Unfortunately my GNU Version is 2.11.3 
where the libraries require 2.12

It will be very helpful, if by any chance you have native libraries built using 
2.11.3.  Kindly share the same with me.

Following are the libraries, I am looking for.

1.   Libhdfs

2.   Libhadoop

3.   Libsnapppy

4.   Libhadoopsnappy

5.   Libhadooputils

Regards,
Gopi


::DISCLAIMER::


The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.




Re: umount bad disk

2014-02-13 Thread Brandon Freeman
LiuLei,

Using your example you can run  umount -l /disk10 ( Lower case L ) on the disk 
and it will lazy dismount and will not wait for a disconnect.

Thanks,

---
Brandon Freeman
System Engineer
Explorys, Inc.
8501 Carnegie Ave., Suite 200
Cleveland, OH 44106
http://www.explorys.com | 216-235-5021

From: Arpit Agarwal mailto:aagar...@hortonworks.com>>
Reply-To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Date: Thursday, February 13, 2014 12:09 PM
To: "u...@hbase.apache.org" 
mailto:u...@hbase.apache.org>>
Subject: Fwd: umount bad disk

bcc'ed hadoop-user

Lei, perhaps hbase-user can help.

-- Forwarded message --
From: lei liu mailto:liulei...@gmail.com>>
Date: Thu, Feb 13, 2014 at 1:04 AM
Subject: umount bad disk
To: user@hadoop.apache.org


I use HBase0.96 and CDH4.3.1.

I use Short-Circuit Local Read:


  dfs.client.read.shortcircuit
  true
  dfs.domain.socket.path
  /home/hadoop/cdh4-dn-socket/dn_socket


When one disk is bad, because the RegionServer open some file on the disk, so I 
don't run "umount", example:
sudo umount -f /disk10
umount2: Device or resource busy


umount: /disk10: device is busy
umount2: Device or resource busy
umount: /disk10: device is busy


I must stop RegionServer in order to run umount command.



How can don't stop RegionServer and delete the bad disk.


Thanks,


LiuLei









CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


har file globbing problem

2014-02-13 Thread Dan Buchan
We have a dataset of ~8Milllion files about .5 to 2 Megs each. And we're
having trouble getting them analysed after building a har file.

The files are already in a pre-existing directory structure, with, two
nested set of dirs with 20-100 pdfs at the bottom of each leaf of the dir
tree.

user->hadoop->/all_the_files/*/*/*.pdf

It was trivial to move these to hdfs and to build a har archive; I used the
following command to make the archive

bin/hadoop archive -archiveName test.har -p /user/hadoop/
all_the_files/*/*/ /user/hadoop/

Listing the contents of the har (bin/hadoop fs -lsr
har:///user/hadoop/epc_test.har) and everything looks as I'd expect.

When we come to run the hadoop job with this command, trying to wildcard
the archive:

bin/hadoop jar My.jar har:///user/hadoop/test.har/all_the_files/*/*/ output

it fails with the following exception

Exception in thread "main" java.lang.IllegalArgumentException: Can not
create a Path from an empty string

Running the job with the non-archived files is fine i.e:

bin/hadoop jar My.jar all_the_files/*/*/ output

However this only works for our modest test set of files. Any substantial
number of files quickly makes the namenode run out of memory.

Can you use file globs with the har archives? Is there a different way to
build the archive to just include the files which I've missed?
I appreciate that a sequence file might be a better fit for this task but
I'd like to know the solution to this issue if there is one.

-- 
 

t.  020 7739 3277
a. 131 Shoreditch High Street, London E1 6JE


Re: Fwd: umount bad disk

2014-02-13 Thread Hiran Chaudhuri
You may want to check this out:

http://stackoverflow.com/questions/323146/how-to-close-a-file-descriptor-from-another-process-in-unix-systems

Hiran Chaudhuri
System Support Programmer / Analyst
Business Development (DB)
Hosting & Regional Services (BH)
Amadeus Data Processing GmbH
Berghamer Strasse 6
85435 Erding
T: +49-8122-43x3662
hiran.chaudh...@amadeus.com
amadeus.com





From:   Arpit Agarwal 
To: u...@hbase.apache.org, 
Date:   13/02/2014 18:10
Subject:Fwd: umount bad disk



bcc'ed hadoop-user

Lei, perhaps hbase-user can help.

-- Forwarded message --
From: lei liu 
Date: Thu, Feb 13, 2014 at 1:04 AM
Subject: umount bad disk
To: user@hadoop.apache.org


I use HBase0.96 and CDH4.3.1.

I use Short-Circuit Local Read:

  dfs.client.read.shortcircuit
  true


  dfs.domain.socket.path
  /home/hadoop/cdh4-dn-socket/dn_socket


When one disk is bad, because the RegionServer open some file on the disk,

so I don't run "umount", example:
sudo umount -f /disk10
umount2: Device or resource busy


umount: /disk10: device is busy
umount2: Device or resource busy
umount: /disk10: device is busy 

I must stop RegionServer in order to run umount command. 


How can don't stop RegionServer and delete the bad disk.

Thanks,

LiuLei











CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity 
to which it is addressed and may contain information that is confidential,

privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified 
that any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender 
immediately and delete it from your system. Thank You.




IMPORTANT  -  CONFIDENTIALITY  NOTICE  - This e-mail is intended only for 
the use of the individual or entity shown above as addressees. It may 
contain information which is privileged, confidential or otherwise 
protected from disclosure under applicable laws.  If the reader of this 
transmission is not the intended recipient, you are hereby notified that 
any dissemination, printing, distribution, copying, disclosure or the 
taking of any action in reliance on the contents of this information is 
strictly prohibited.  If you have received this transmission in error, 
please immediately notify us by reply e-mail or using the address below 
and delete the message and any attachments from your system. 

Amadeus Data Processing GmbH 
Geschäftsführer: Dr. Wolfgang Krips
Sitz der Gesellschaft: Erding 
HR München 48 199 
Berghamer Strasse 6 
85435 Erding 
Germany<>

Re: Unsubscribe

2014-02-13 Thread Ted Yu
See https://hadoop.apache.org/mailing_lists.html#User


On Thu, Feb 13, 2014 at 9:39 AM, Scott Kahler wrote:

>  Unsubscribe
>
>


Unsubscribe

2014-02-13 Thread Scott Kahler
Unsubscribe



Fwd: umount bad disk

2014-02-13 Thread Arpit Agarwal
bcc'ed hadoop-user

Lei, perhaps hbase-user can help.

-- Forwarded message --
From: lei liu 
Date: Thu, Feb 13, 2014 at 1:04 AM
Subject: umount bad disk
To: user@hadoop.apache.org


I use HBase0.96 and CDH4.3.1.

I use Short-Circuit Local Read:


  dfs.client.read.shortcircuit
  true
  dfs.domain.socket.path
  /home/hadoop/cdh4-dn-socket/dn_socket

When one disk is bad, because the RegionServer open some file on the
disk, so I don't run "umount", example:
sudo umount -f /disk10
umount2: Device or resource busy

umount: /disk10: device is busy
umount2: Device or resource busy
umount: /disk10: device is busy

I must stop RegionServer in order to run umount command.


How can don't stop RegionServer and delete the bad disk.

Thanks,

LiuLei

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


org.apache.hadoop.mapred.JobSubmissionProtocol.getSystemDir() doesn't throw IOException

2014-02-13 Thread Subroto
Hi,

I am using cdh4.4-mr1 for my scenario with Jobtracker HA.
During a failover of Jobtracker,  the client is not retrying.
The root cause is:
The method:
org.apache.hadoop.mapred.JobSubmissionProtocol.getSystemDir() doesn't throw any 
exception and when client failover happens on this API call, the exception 
being thrown is:
java.lang.reflect.UndeclaredThrowableException which wraps 
java.net.ConnectException.

which is not one of the exception on which retry is allowed:
1) ConnectException
2) NoRouteToHostException
3) UnknownHostException
4) StandbyException (or Wrapped StandbyException)
5) ConnectTimeoutException

Ideally the code should check wrapped exception or the API should throw 
IOExceptions like others.

Is this bug being already known to community ??
Do we have a workaround for this??

Cheers,
Subroto Sanyal

signature.asc
Description: Message signed with OpenPGP using GPGMail


umount bad disk

2014-02-13 Thread lei liu
I use HBase0.96 and CDH4.3.1.

I use Short-Circuit Local Read:


  dfs.client.read.shortcircuit
  true
  dfs.domain.socket.path
  /home/hadoop/cdh4-dn-socket/dn_socket

When one disk is bad, because the RegionServer open some file on the
disk, so I don't run "umount", example:
sudo umount -f /disk10
umount2: Device or resource busy
umount: /disk10: device is busy
umount2: Device or resource busy
umount: /disk10: device is busy

I must stop RegionServer in order to run umount command.


How can don't stop RegionServer and delete the bad disk.

Thanks,

LiuLei