PySpark Write File Container exited with a non-zero exit code 143

2021-05-19 Thread Clay McDonald
Hello all,

I'm hoping someone can give me some direction for troubleshooting this issue, 
I'm trying to write from Spark on an HortonWorks(Cloudera) HDP cluster. I ssh 
directly to the first datanode and run PySpark with the following command; 
however, it is always failing no matter what size I set memory in Yarn 
Containers and Yarn Queues. Any suggestions?



pyspark --conf queue=default --conf executory-memory=24G

--

HDFS_RAW="/HDFS/Data/Test/Original/MyData_data/"
#HDFS_OUT="/ HDFS/Data/Test/Processed/Convert_parquet/Output"
HDFS_OUT="/tmp"
ENCODING="utf-16"

fileList1=[
'Test _2003.txt'
]
from  pyspark.sql.functions import regexp_replace,col
for f in fileList1:
fname=f
fname_noext=fname.split('.')[0]
df = 
spark.read.option("delimiter","|").option("encoding",ENCODING).option("multiLine",True).option('wholeFile',"true").csv('{}/{}'.format(HDFS_RAW,fname),
 header=True)
lastcol=df.columns[-1]
print('showing {}'.format(fname))
if ('\r' in lastcol):
lastcol=lastcol.replace('\r','')
df=df.withColumn(lastcol, 
regexp_replace(col("{}\r".format(lastcol)), "[\r]", 
"")).drop('{}\r'.format(lastcol))

df.write.format('parquet').mode('overwrite').save("{}/{}".format(HDFS_OUT,fname_noext))



Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
1.0 (TID 4, DataNode01.mydomain.com, executor 5): ExecutorLostFailure (executor 
5 exited caused by one of the running tasks) Reason: Container marked as 
failed: container_e331_1621375512548_0021_01_06 on host: 
DataNode01.mydomain.com. Exit status: 143. Diagnostics: [2021-05-19 
18:09:06.392]Container killed on request. Exit code is 143
[2021-05-19 18:09:06.413]Container exited with a non-zero exit code 143.
[2021-05-19 18:09:06.414]Killed by external signal


THANKS! CLAY



Re: Problems installing Hadoop on Windows Server 2012 R2

2017-11-11 Thread Clay McDonald

So funny! Even Microsoft has moved on to Linux. All cloud servers are Linux and 
now all DotNet has been ported over to Linux. We have started installing SQL 
Server on Linux. I've work on Windows my whole career, but now I'm studying for 
my Linux Admin exam. Windows as a server OS is starting to die out.

On Nov 11, 2017, at 12:08 PM, Dr. Tibor Kurina 
> wrote:

Exactly... ?
Why, for the HELL, You trying to install Hadoop on the windows...?

Sent from Mail for Windows 10

From: Pavel Drankov
Sent: Saturday, November 11, 2017 18:06
Cc: user@hadoop.apache.org
Subject: Re: Problems installing Hadoop on Windows Server 2012 R2

Hi,

Why are you trying to run it on Winodws? It is not recommended.

Best wishes,
Pavel

On 10 November 2017 at 04:44, Iv?n Galaviz 
> wrote:
Hi,
I'm having a lot of problems installing Hadoop on Windows Server 2012 R2,
I'm currently trying to install it with these programs:
JDK 8u151
Maven 3.5.2
ProtocolBuffer 2.5.0
Cmake 3.9.5
Cygwin 2.9.0
Visual Studio 2017
All of these are installed on a x64 computer
This was the last error I found:
 [exec] Project 
"C:\hdp\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\CMakeFiles\3.9.5\VCTargetsPath.vcxproj"
 on node 1 (default targets).
 [exec] C:\Progra~2\Microsoft Visual 
Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.Cpp.Current.targets(64,5):
 error MSB4062: The "SetEnv" task could not be loaded from the assembly 
C:\Program Files (x86)\Microsoft Visual 
Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.Build.CppTasks.Common.dll.
 Could not load file or assembly 'Microsoft.Build.Utilities.Core, 
Version=15.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of 
its dependencies. The system cannot find the file specified. Confirm that the 
 declaration is correct, that the assembly and all its dependencies 
are available, and that the task contains a public class that implements 
Microsoft.Build.Framework.ITask. 
[C:\hdp\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\CMakeFiles\3.9.5\VCTargetsPath.vcxproj]
 [exec] Done Building Project 
"C:\hdp\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\CMakeFiles\3.9.5\VCTargetsPath.vcxproj"
 (default targets) -- FAILED.

 As you can see, I'm having problems specially with VS.

Should I install older versions of these programs or something else?

Thank you in advance.



-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Maps stuck on Pending

2014-03-27 Thread Clay McDonald
Hi all, I have a job running with 1750 maps and 1 reduce and the status has 
been the same for the last two hours. Any thoughts?

Thanks, Clay

RE: Maps stuck on Pending

2014-03-27 Thread Clay McDonald
Thanks Serge, looks like I need to at memory to my datanodes.

Clay McDonald 
Cell: 202.560.4101 
Direct: 202.747.5962 

-Original Message-
From: Serge Blazhievsky [mailto:hadoop...@gmail.com] 
Sent: Thursday, March 27, 2014 2:16 PM
To: user@hadoop.apache.org
Cc: user@hadoop.apache.org
Subject: Re: Maps stuck on Pending

Next step would be to look in the logs under userlog directory for that job 

Sent from my iPhone

 On Mar 27, 2014, at 11:08 AM, Clay McDonald stuart.mcdon...@bateswhite.com 
 wrote:
 
 Hi all, I have a job running with 1750 maps and 1 reduce and the status has 
 been the same for the last two hours. Any thoughts?
 
 Thanks, Clay


RE: NodeManager health Question

2014-03-14 Thread Clay McDonald
Thanks Rohith, I restarted the datanodes and all is well.



From: Rohith Sharma K S [mailto:rohithsharm...@huawei.com]
Sent: Thursday, March 13, 2014 10:56 PM
To: user@hadoop.apache.org
Subject: RE: NodeManager health Question

Hi ,

  As troubleshooting, few things  you can verify

1. check RM web UI for Is there any 'Active Nodes' in Yarn cluster?. 
http://http://%3c yarn.resourcemanager.webapp.address/cluster.

And also verify for Lost Nodes or Unhealthy Nodes or Rebooted Nodes.
 If there any active nodes, then cross verify for Memory 
Total. This should be Memory Total  = Number of Active Nodes * value of { 
yarn.nodemanager.resource.memory-mb }


2. NodeManger logs give more information. NM logs also check.

 In Yarn, my Hive queries are Accepted but are Unassigned and do not run
 This may be  your Yarn Cluster does not have enough memory to 
launch container. Possible reason could be

1. None of the NM are sending heart beat to RM.(check RM Web UI for 
Unhealthy Nodes)

2. All the NM are lost/unhealthy.

3. Full cluster capacity is Used. So yarn scheduler is waiting for some 
container to get over, so it can assign released memory to other containers.

Looking into  your DataNode socket timeout exception ( that too 8 
minutes!!!), I suspect that Hadoop cluster Network is UNSTABLE. Better to debug 
on network.


Thanks  Regards
Rohith Sharma K S

From: Clay McDonald [mailto:stuart.mcdon...@bateswhite.com]
Sent: 14 March 2014 01:30
To: 'user@hadoop.apache.org'
Subject: NodeManager health Question

Hello all, I have laid out my POC in a project plan and have HDP 2.0 installed. 
HDFS is running fine and have loaded up about 6TB of data to run my test on. I 
have a series of SQL queries that I will run in Hive ver. 0.12.0. I had to 
manually install Hue and still have a few issues I'm working on there. But at 
the moment, my most pressing issue is with Hive jobs not running. In Yarn, my 
Hive queries are Accepted but are Unassigned and do not run. See attached.

In Ambari, the datanodes all have the following error; NodeManager health CRIT 
for 20 days CRITICAL: NodeManager unhealthy

From the datanode logs I found the following;

ERROR datanode.DataNode (DataXceiver.java:run(225)) - 
dc-bigdata1.bateswhite.com:50010:DataXceiver error processing READ_BLOCK 
operation  src: /172.20.5.147:51299 dest: /172.20.5.141:50010
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/172.20.5.141:50010 remote=/172.20.5.147:51299]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)

Also, in the namenode log I see the following;

2014-03-13 13:50:57,204 WARN  security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1355)) - No groups available for user 
dr.who


If anyone can point me in the right direction to troubleshoot this, I would 
really appreciate it!

Thanks! Clay


RE: NodeManager health Question

2014-03-14 Thread Clay McDonald
What do you want to know? Here is how it goes;


1. We receive 6TB from an outside client and need to analyze the data 
quickly and report on our findings. I'm using an analysis that was done in our 
current environment with the same data.

2. Upload the data to hdfs with -put

3. Create tables in Hive with external like to the data in hdfs with STORED 
AS TEXTFILE LOCATION. (SQL is required for our analyst)

4. Convert current SQL to HiveSQL and run analysis.

5. Test ODBC connections to Hive data for pulling data.

Clay


From: ados1...@gmail.com [mailto:ados1...@gmail.com]
Sent: Friday, March 14, 2014 11:40 AM
To: user
Subject: Re: NodeManager health Question

Hey Clay,

How have you loaded 6TB data into HDP? I am in a similar situation and wanted 
to understand your use case.

On Thu, Mar 13, 2014 at 3:59 PM, Clay McDonald 
stuart.mcdon...@bateswhite.commailto:stuart.mcdon...@bateswhite.com wrote:
Hello all, I have laid out my POC in a project plan and have HDP 2.0 installed. 
HDFS is running fine and have loaded up about 6TB of data to run my test on. I 
have a series of SQL queries that I will run in Hive ver. 0.12.0. I had to 
manually install Hue and still have a few issues I'm working on there. But at 
the moment, my most pressing issue is with Hive jobs not running. In Yarn, my 
Hive queries are Accepted but are Unassigned and do not run. See attached.

In Ambari, the datanodes all have the following error; NodeManager health CRIT 
for 20 days CRITICAL: NodeManager unhealthy

From the datanode logs I found the following;

ERROR datanode.DataNode (DataXceiver.java:run(225)) - 
dc-bigdata1.bateswhite.com:50010:DataXceiver error processing READ_BLOCK 
operation  src: /172.20.5.147:51299http://172.20.5.147:51299 dest: 
/172.20.5.141:50010http://172.20.5.141:50010
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/172.20.5.141:50010http://172.20.5.141:50010 
remote=/172.20.5.147:51299http://172.20.5.147:51299]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)

Also, in the namenode log I see the following;

2014-03-13 13:50:57,204 WARN  security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1355)) - No groups available for user 
dr.who


If anyone can point me in the right direction to troubleshoot this, I would 
really appreciate it!

Thanks! Clay



RE: NodeManager health Question

2014-03-14 Thread Clay McDonald
Also, I too created all my processes and SQL in Hortonworks' sandbox with small 
sample data. Then, we created 7 VMs and attached enough storage to handle the 
full dataset test. I installed and configured CentOS and installed Hortonwork 
HDP 2.0 using Ambari. The cluster is 4 datanodes and 3 master nodes. Also, with 
the sandbox, Hue comes with it, but when you install with Ambari, Hue is not 
installed so you have to install Hue manually. Now I'm running queries on the 
full dataset. Clay


From: Clay McDonald [mailto:stuart.mcdon...@bateswhite.com]
Sent: Friday, March 14, 2014 12:52 PM
To: 'user@hadoop.apache.org'
Subject: RE: NodeManager health Question

What do you want to know? Here is how it goes;


1. We receive 6TB from an outside client and need to analyze the data 
quickly and report on our findings. I'm using an analysis that was done in our 
current environment with the same data.

2. Upload the data to hdfs with -put

3. Create tables in Hive with external like to the data in hdfs with STORED 
AS TEXTFILE LOCATION. (SQL is required for our analyst)

4. Convert current SQL to HiveSQL and run analysis.

5. Test ODBC connections to Hive data for pulling data.

Clay


From: ados1...@gmail.commailto:ados1...@gmail.com [mailto:ados1...@gmail.com]
Sent: Friday, March 14, 2014 11:40 AM
To: user
Subject: Re: NodeManager health Question

Hey Clay,

How have you loaded 6TB data into HDP? I am in a similar situation and wanted 
to understand your use case.

On Thu, Mar 13, 2014 at 3:59 PM, Clay McDonald 
stuart.mcdon...@bateswhite.commailto:stuart.mcdon...@bateswhite.com wrote:
Hello all, I have laid out my POC in a project plan and have HDP 2.0 installed. 
HDFS is running fine and have loaded up about 6TB of data to run my test on. I 
have a series of SQL queries that I will run in Hive ver. 0.12.0. I had to 
manually install Hue and still have a few issues I'm working on there. But at 
the moment, my most pressing issue is with Hive jobs not running. In Yarn, my 
Hive queries are Accepted but are Unassigned and do not run. See attached.

In Ambari, the datanodes all have the following error; NodeManager health CRIT 
for 20 days CRITICAL: NodeManager unhealthy

From the datanode logs I found the following;

ERROR datanode.DataNode (DataXceiver.java:run(225)) - 
dc-bigdata1.bateswhite.com:50010:DataXceiver error processing READ_BLOCK 
operation  src: /172.20.5.147:51299http://172.20.5.147:51299 dest: 
/172.20.5.141:50010http://172.20.5.141:50010
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/172.20.5.141:50010http://172.20.5.141:50010 
remote=/172.20.5.147:51299http://172.20.5.147:51299]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)

Also, in the namenode log I see the following;

2014-03-13 13:50:57,204 WARN  security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1355)) - No groups available for user 
dr.who


If anyone can point me in the right direction to troubleshoot this, I would 
really appreciate it!

Thanks! Clay



HDP 2.0 REST API

2014-01-29 Thread Clay McDonald

Hello, I'm attempting to access hdfs from my browser, but when I goto the URL 
http:// http://dc-bigdata5.bateswhite.com:50070/webhdfs/v1/

I get the following error;

{RemoteException:{exception:UnsupportedOperationException,javaClassName:java.lang.UnsupportedOperationException,message:op=NULL
 is not supported}}

Any ideas???

Thanks,
Clay McDonald
Database Administrator
Bates White, LLC
1300 Eye St, NW, Suite 600 East
Washington, DC 20005
Main: 202.408.6110
Cell: 202.560.4101
Direct: 202.747.5962
Email: clay.mcdon...@bateswhite.commailto:clay.mcdon...@bateswhite.com

This electronic message transmission contains information from Bates White, 
LLC, which may be confidential or privileged. The information is intended to be 
for the use of the individual or entity named above. If you are not the 
intended recipient, be aware that any disclosure, copying, distribution, or use 
of the contents of this information is prohibited.
If you have received this electronic transmission in error, please notify me by 
telephone at 202.747.5962 or by electronic mail at 
clay.mcdon...@bateswhite.commailto:clay.mcdon...@bateswhite.com immediately.
*


Re: HDP 2.0 Install fails on repo unavailability

2013-10-24 Thread Clay McDonald
I think I have the fix for this. I'll check when I get home.

Clay McDonald
Sent from my iPhone

On Oct 24, 2013, at 7:36 PM, Hitesh Shah hit...@apache.org wrote:

 BCC'ing user@hadoop.
 
 This is a question for the ambari mailing list. 
 
 -- Hitesh 
 
 On Oct 24, 2013, at 3:36 PM, Jain, Prem wrote:
 
 Folks,
 
 Trying to install the newly release Hadoop 2.0 using Ambari. I am able to 
 install Ambari, but when I try to install Hadoop 2.0 on rest of the cluster, 
 the installation fails erroring on repo mirror unavailability.  Not sure 
 where I messed up.
 
 Here are the error messages
 
 Output log from AMBARI during Installation
 
 notice: 
 /Stage[1]/Hdp::Snappy::Package/Hdp::Snappy::Package::Ln[32]/Hdp::Exec[hdp::snappy::package::ln
  32]/Exec[hdp::snappy::package::ln 32]/returns: executed successfully
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-libhdfs]/ensure: 
 change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 
 -y install hadoop-libhdfs' returned 1:
 
 Error Downloading Packages:
  hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
  zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
  hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
 
 
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-lzo]/ensure: change 
 from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y 
 install hadoop-lzo' returned 1:
 
 Error Downloading Packages:
  hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
  zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
 
 
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[hdfs-site]/File[/etc/hadoop/conf/hdfs-site.xml]/content:
  content changed '{md5}117224b1cf67c151f8a3d7ac0a157fa5' to 
 '{md5}ba383a94bdde1a0b2eb5c59b1f5b61e7'
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[capacity-scheduler]/File[/etc/hadoop/conf/capacity-scheduler.xml]/content:
  content changed '{md5}08d7e952b3e2d4fd5a2a880dfcd3a2df' to 
 '{md5}dd3922fc27f72cd78cf2b47f57351b08'
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[core-site]/File[/etc/hadoop/conf/core-site.xml]/content:
  content changed '{md5}76d06ebce1310be7e65ae0c7e8c3068a' to 
 '{md5}1b626aa016a6f916271f67f3aa22cbbb'
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop]/ensure: change from 
 absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install 
 hadoop' returned 1:
 
 Error Downloading Packages:
  hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
  zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
 
 
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop-libhdfs] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop-lzo] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]:
  Dependency Package[hadoop-libhdfs] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]:
  Dependency Package[hadoop-lzo] has failures: true
 
 Repo :
 
 http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.25
 ErrorCodeNoSuchKey/CodeMessageThe specified key does not 
 exist./MessageKeyambari/centos6/1.x/updates/1.4.1.25/KeyRequestId4693487CE703DB53/RequestIdHostIdB87iAHvcpH7im27HOuEKBJ0F+qPFf+7aXuTe+O7OhLb9WscyxTbV/2yUPXO+KPOJ/HostId/Error
 
 
 Manual install:
 
 [root@dn5 ~]# /usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs
 Error Downloading Packages:
  hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6

RE: Hadoop Installation on Ubuntu

2012-07-03 Thread Clay McDonald
I used hduser to do the install.

You need to add both the hduser user and hadoop group, use the following;
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

To check is hduser belongs to the hadoop group, use the following;
hduser@Bigdata:~$ groups
hduser adm dialout cdrom plugdev lpadmin sambashare admin

add existing hduser to hadoop group.
hduser@Bigdata:~$ sudo usermod -g hadoop hduser
hduser@Bigdata:~$ groups
hduser hadoop adm dialout cdrom plugdev lpadmin sambashare admin

If you need to change the directory ownership, use;
hduser@Bigdata:~$ sudo chown -R hduser:hadoop /usr/local/hadoop/

Clay McDonald


-Original Message-
From: Ying Huang [mailto:huangyi...@gmail.com] 
Sent: Tuesday, July 03, 2012 11:03 PM
To: common-user@hadoop.apache.org
Subject: Hadoop Installation on Ubuntu

Hello,
 I have downloaded hadoop_1.0.3-1_x86_64.deb from hadoop official website, 
and installed using command under root privileged.
 dpkg -i hadoop_1.0.3-1_x86_64.deb
 But there is an error: chown: invalid group: `root:hadoop'.
 And later, when I running hadoop-setup-single-node.sh, I got lots of chown 
errors:

Proceed with setup? (y/n) y
chown: invalid user: `mapred:hadoop'
chown: invalid user: `hdfs:hadoop'
chown: invalid user: `mapred:hadoop'
chown: invalid user: `hdfs:hadoop'
chown: invalid group: `root:hadoop'
chown: invalid user: `hdfs:hadoop'
chown: invalid user: `mapred:hadoop'
chown: invalid group: `root:hadoop'
chown: invalid group: `root:hadoop'
chown: invalid group: `root:hadoop'
 My java version is

java version 1.6.0_24
OpenJDK Runtime Environment (IcedTea6 1.11.1) (6b24-1.11.1-4ubuntu3) OpenJDK 
64-Bit Server VM (build 20.0-b12, mixed mode)

 Should I manually create the group hadoop, if so, with what command or 
right mode, 444 or 777?
 And is there detail document about how to install and setup using deb file?


-- Best Regards
Ying Huang




RE: Hadoop Installation on Ubuntu

2012-07-03 Thread Clay McDonald
Corrected spacing for better read.


I used hduser to do the install.

You need to add both the hduser user and hadoop group, use the following; 

$ sudo addgroup hadoop 

$ sudo adduser --ingroup hadoop hduser



To check is hduser belongs to the hadoop group, use the following; 

hduser@Bigdata:~$ groups 


add existing hduser to hadoop group.

hduser@Bigdata:~$ sudo usermod -g hadoop hduser 

hduser@Bigdata:~$ groups 



If you need to change the directory ownership, use; 

hduser@Bigdata:~$ sudo chown -R hduser:hadoop /usr/local/hadoop/



Clay McDonald


-Original Message-
From: Ying Huang [mailto:huangyi...@gmail.com]
Sent: Tuesday, July 03, 2012 11:03 PM
To: common-user@hadoop.apache.org
Subject: Hadoop Installation on Ubuntu

Hello,
 I have downloaded hadoop_1.0.3-1_x86_64.deb from hadoop official website, 
and installed using command under root privileged.
 dpkg -i hadoop_1.0.3-1_x86_64.deb
 But there is an error: chown: invalid group: `root:hadoop'.
 And later, when I running hadoop-setup-single-node.sh, I got lots of chown 
errors:

Proceed with setup? (y/n) y
chown: invalid user: `mapred:hadoop'
chown: invalid user: `hdfs:hadoop'
chown: invalid user: `mapred:hadoop'
chown: invalid user: `hdfs:hadoop'
chown: invalid group: `root:hadoop'
chown: invalid user: `hdfs:hadoop'
chown: invalid user: `mapred:hadoop'
chown: invalid group: `root:hadoop'
chown: invalid group: `root:hadoop'
chown: invalid group: `root:hadoop'
 My java version is

java version 1.6.0_24
OpenJDK Runtime Environment (IcedTea6 1.11.1) (6b24-1.11.1-4ubuntu3) OpenJDK 
64-Bit Server VM (build 20.0-b12, mixed mode)

 Should I manually create the group hadoop, if so, with what command or 
right mode, 444 or 777?
 And is there detail document about how to install and setup using deb file?


-- Best Regards
Ying Huang