Problems installing Hadoop on Windows 7 enterprise N

2015-03-09 Thread Daniël van Dam
Hi,

Currently I'm trying to install Hadoop 2.6.0 on a Windows 7 enterprise computer.
I used the following guide
http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os

I already fixed some previous problems by installing a correct version of 
Protocol Buffer.
But now ran into a new problem.
[cid:image001.png@01D05A71.9818C9B0]
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec 
(compile-ms-winutils) on project hadoop-common: Command execution failed. 
Process exited with an error: 1(Exit value: 1) -> [Help 1]

I searched the internet and found that perhaps my MSBuild was causing the 
problem, but I check my PATH and it contained the MSBuild path to the correct 
location. I'm using  .Net Framework 4.0

I tried running "mvn clean install" and from the tests I got the following 
errors.

---
Test set: 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator
---
Tests run: 14, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 33.35 sec <<< 
FAILURE! - in 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator
testAuthenticationHttpClientPost[0](org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator)
  Time elapsed: 2.201 sec  <<< ERROR!
org.apache.http.client.ClientProtocolException: null
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:693)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.hadoop.security.authentication.client.AuthenticatorTestCase.doHttpClientRequest(AuthenticatorTestCase.java:265)
at 
org.apache.hadoop.security.authentication.client.AuthenticatorTestCase._testAuthenticationHttpClient(AuthenticatorTestCase.java:291)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator$4.call(TestKerberosAuthenticator.java:160)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator$4.call(TestKerberosAuthenticator.java:157)
at 
org.apache.hadoop.security.authentication.KerberosTestUtils$1.run(KerberosTestUtils.java:102)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.authentication.KerberosTestUtils.doAs(KerberosTestUtils.java:99)
at 
org.apache.hadoop.security.authentication.KerberosTestUtils.doAsClient(KerberosTestUtils.java:115)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator.testAuthenticationHttpClientPost(TestKerberosAuthenticator.java:157)

testAuthenticationHttpClientPost[1](org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator)
  Time elapsed: 1.914 sec  <<< ERROR!
org.apache.http.client.ClientProtocolException: null
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:693)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.hadoop.security.authentication.client.AuthenticatorTestCase.doHttpClientRequest(AuthenticatorTestCase.java:265)
at 
org.apache.hadoop.security.authentication.client.AuthenticatorTestCase._testAuthenticationHttpClient(AuthenticatorTestCase.java:291)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator$4.call(TestKerberosAuthenticator.java:160)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator$4.call(TestKerberosAuthenticator.java:157)
at 
org.apache.hadoop.security.authentication.KerberosTestUtils$1.run(KerberosTestUtils.java:102)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.authentication.KerberosTestUtils.doAs(KerberosTestUtils.java:99)
at 
org.apache.had

How reduce tasks know which partition they should read?

2015-03-09 Thread xeonmailinglist-gmail

Hi,

I am looking to the Yarn mapreduce internals to try to understand how 
reduce tasks know which partition of the map output they should read. 
Even, when they re-execute after a crash?


I am also looking to the mapreduce source code. Is there any class that 
I should look to try to understand this question?


Any help?

Thanks


--
--



which file map and reduce in reading or writing?

2015-03-09 Thread xeonmailinglist-gmail

Hi,

I am looking to YARN MapReduce internals, and I would like know if it 
possible to know which file a map/reduce function is reading or writing  
from inside a map or reduce function defined by the user, or simply by 
the client?



Thanks,

--
--



Re: which file map and reduce in reading or writing?

2015-03-09 Thread Kai Voigt
Hi,

the context object (passed to map(), reduce() and setup()) contains information 
about the input split, such as the file name.

From the top of my head: String fileName = ((FileSplit) 
context.getInputSplit()).getPath().getName();

Kai

> Am 09.03.2015 um 16:39 schrieb xeonmailinglist-gmail 
> :
> 
> Hi,
> 
> I am looking to YARN MapReduce internals, and I would like know if it 
> possible to know which file a map/reduce function is reading or writing  from 
> inside a map or reduce function defined by the user, or simply by the client?
> 
> 
> Thanks,
> 
> -- 
> --
> 
> 

Kai Voigt   Am Germaniahafen 1  
k...@123.org
24143 Kiel  
+49 160 96683050
Germany 
@KaiVoigt



Re: which file map and reduce in reading or writing?

2015-03-09 Thread xeonmailinglist-gmail
Reducer doesn't have the ((FileSplit) context.getInputSplit()). Just the 
mapper.


I would like to know from java which files a reducer is reading, and to 
where it is writing. Can I do this?


Thanks,

On 09-03-2015 15:44, Kai Voigt wrote:

Hi,

the context object (passed to map(), reduce() and setup()) contains 
information about the input split, such as the file name.


From the top of my head: String fileName = ((FileSplit) 
context.getInputSplit()).getPath().getName();


Kai

Am 09.03.2015 um 16:39 schrieb xeonmailinglist-gmail 
mailto:xeonmailingl...@gmail.com>>:


Hi,

I am looking to YARN MapReduce internals, and I would like know if it 
possible to know which file a map/reduce function is reading or 
writing  from inside a map or reduce function defined by the user, or 
simply by the client?



Thanks,

--
--





*Kai Voigt*Am Germaniahafen 1...@123.org 
24143 Kiel+49 160 96683050
Germany@KaiVoigt



--
--



Forcing kerberos principals to lowercase

2015-03-09 Thread Matt Davies
Hey everyone,

I have a situation where case sensitivity of principals is causing some
pains. Here's the set up

AD
Centrify (Centrify.com)
hadoop on RHEL

The logins to the cluster are all using LDAP/PAM . When I do something like
"id" then it returns the lowercase version of the user.  "env" return the
upper case version

for example
id
uid=1234(user1234)

env | grep USER_PRIN
USER_PRINCIPAL_NAME=USER1234@REALM

When the user1234 tries to write to hdfs with a home directory of
/user/user1234 I receive a permission error.  If I create it as
/user/USER1234 then it behaves correctly.

If I were to just push a file to HDFS /tmp it looks like
-rw-r--r--3 USER1234hdfs 1234  2005-03-07 15:34 /tmp/file

To make matters a little more complicated some usernames are mixed case so
either upper or lower is not sufficient.  I just want to force to a lower
case user.

So, my question is  this: Is there a way to tell the kerberos setup in HDFS
to force lowercase users (matching what linux does) so that the admins do
not go crazy figuring out silly things like case?

Thanks in advance.


Re: Forcing kerberos principals to lowercase

2015-03-09 Thread Matt Davies
Sorry - I forgot to mention that I've seen the configuration with Hue, but
I want plain CLI access to work.

-Matt

On Mon, Mar 9, 2015 at 12:01 PM, Matt Davies  wrote:

> Hey everyone,
>
> I have a situation where case sensitivity of principals is causing some
> pains. Here's the set up
>
> AD
> Centrify (Centrify.com)
> hadoop on RHEL
>
> The logins to the cluster are all using LDAP/PAM . When I do something
> like "id" then it returns the lowercase version of the user.  "env" return
> the upper case version
>
> for example
> id
> uid=1234(user1234)
>
> env | grep USER_PRIN
> USER_PRINCIPAL_NAME=USER1234@REALM
>
> When the user1234 tries to write to hdfs with a home directory of
> /user/user1234 I receive a permission error.  If I create it as
> /user/USER1234 then it behaves correctly.
>
> If I were to just push a file to HDFS /tmp it looks like
> -rw-r--r--3 USER1234hdfs 1234  2005-03-07 15:34 /tmp/file
>
> To make matters a little more complicated some usernames are mixed case so
> either upper or lower is not sufficient.  I just want to force to a lower
> case user.
>
> So, my question is  this: Is there a way to tell the kerberos setup in
> HDFS to force lowercase users (matching what linux does) so that the admins
> do not go crazy figuring out silly things like case?
>
> Thanks in advance.
>


Don't delete temp files

2015-03-09 Thread xeonmailinglist-gmail

Hi,

During a mapreduce execution, there are some temp configuration files 
that are created.
I want that these temp files won't be removed when a job ends. How I 
configure this?


Thanks

--
--



Re: Problems installing Hadoop on Windows 7 enterprise N

2015-03-09 Thread Xuan Gong
Hey,
You might need to skip all the unit tests to finish your build.
By doing that, you could add -DskipTests


Thanks


From: Daniël van Dam 
mailto:daniel.van...@ortec-finance.com>>
Reply-To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Date: Monday, March 9, 2015 at 6:01 AM
To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Subject: Problems installing Hadoop on Windows 7 enterprise N

Hi,

Currently I’m trying to install Hadoop 2.6.0 on a Windows 7 enterprise computer.
I used the following guide
http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os

I already fixed some previous problems by installing a correct version of 
Protocol Buffer.
But now ran into a new problem.
[cid:image001.png@01D05A71.9818C9B0]
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec 
(compile-ms-winutils) on project hadoop-common: Command execution failed. 
Process exited with an error: 1(Exit value: 1) -> [Help 1]

I searched the internet and found that perhaps my MSBuild was causing the 
problem, but I check my PATH and it contained the MSBuild path to the correct 
location. I’m using  .Net Framework 4.0

I tried running “mvn clean install” and from the tests I got the following 
errors.

---
Test set: 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator
---
Tests run: 14, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 33.35 sec <<< 
FAILURE! - in 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator
testAuthenticationHttpClientPost[0](org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator)
  Time elapsed: 2.201 sec  <<< ERROR!
org.apache.http.client.ClientProtocolException: null
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:693)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.hadoop.security.authentication.client.AuthenticatorTestCase.doHttpClientRequest(AuthenticatorTestCase.java:265)
at 
org.apache.hadoop.security.authentication.client.AuthenticatorTestCase._testAuthenticationHttpClient(AuthenticatorTestCase.java:291)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator$4.call(TestKerberosAuthenticator.java:160)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator$4.call(TestKerberosAuthenticator.java:157)
at 
org.apache.hadoop.security.authentication.KerberosTestUtils$1.run(KerberosTestUtils.java:102)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.authentication.KerberosTestUtils.doAs(KerberosTestUtils.java:99)
at 
org.apache.hadoop.security.authentication.KerberosTestUtils.doAsClient(KerberosTestUtils.java:115)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator.testAuthenticationHttpClientPost(TestKerberosAuthenticator.java:157)

testAuthenticationHttpClientPost[1](org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator)
  Time elapsed: 1.914 sec  <<< ERROR!
org.apache.http.client.ClientProtocolException: null
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:693)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.hadoop.security.authentication.client.AuthenticatorTestCase.doHttpClientRequest(AuthenticatorTestCase.java:265)
at 
org.apache.hadoop.security.authentication.client.AuthenticatorTestCase._testAuthenticationHttpClient(AuthenticatorTestCase.java:291)
at 
org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator$4.call(TestKerberosAuthenticator.java:160)
at 
org.apache.hadoop.security.authentication.client

Re: What skills to Learn to become Hadoop Admin

2015-03-09 Thread max scalf
Hi Jay,

Is there a blog or anything that talks about setting up this big pet store
application?  as i looked at the GIT readme file and was a little bit
lost.  Maybe thats becuase i am new to Hadoop.

On Sat, Mar 7, 2015 at 10:34 AM, jay vyas 
wrote:

> Setting up vendor distros is a great first step.
>
> 1) Running TeraSort and benchmarking is a good step.  You can also run
> larger, full stack hadoop applications like bigpetstore, which we curate
> here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.
>
> 2) Write some mapreduce or spark jobs which write data to a persistent
> transactional store, such as SOLR or HBase.  This is a hugely important
> part of real world hadoop administration, where you will encounter problems
> like running out of memory, possibly CPU overclocking on some nodes, and so
> on.
>
> 3) Now, did you want to go deeper into the build/setup/deployment of
> hadoop ?  Its worth it  to try building/deploying/debugging hadoop ecosytem
> components from scratch, by setting up Apache BigTop, which packages
> RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
> original roots of both the cloudera and hortonworks distributions, so you
> will learn something about both by playing with it.
>
> We have some exersizes you can use to guide you and get started
> https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes
> .  Feel free to join the mailing list for questions.
>
>
>
>
> On Sat, Mar 7, 2015 at 9:32 AM, max scalf  wrote:
>
>> Krish,
>>
>> I dont mean to hijack your mail here but i wanted to find out how/what
>> you did for the below portion, as i am trying to go down your path as well,
>> i was able to get 4-5 node cluster using ambari and cdh and now wanted to
>> take it to next level.  What have you done for below?
>>
>> "I have done a web log integration using flume and twitter sentiment
>> analysis."
>>
>> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald 
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to enter into Big Data world as Hadoop Admin and I have
>>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>>> I have installed the services like hive, oozie, zookeeper etc.
>>>
>>> I have done a web log integration using flume and twitter sentiment
>>> analysis.
>>>
>>> I wanted to understand what are the other skills I should learn ?
>>>
>>> Thanks
>>> Krish
>>>
>>
>>
>
>
> --
> jay vyas
>


Tasks stuck in UNASSIGNED state - Hadoop 1.2.1

2015-03-09 Thread Premal Shah
Hi,
We have a hadoop 1.2.1 cluster running on EC2. The job tracker is an On 
Demand box while we task trackers are stop instances all being setup using 
an AMI. There are normally 100 instances running with about 1500 map and 
reduce slots. We run hundreds of hive queries everyday, there are no 
custom map-reduce jobs.

In the past few days, the queries are running slower than before. On 
investigation we found that the tasks are not starting even though there 
are free map and reduce slots. The cpu load on the task trackers is around 
50%, the load average is lower than the number of cores and the memory 
utilization is nowhere close to the available max. The tasks are getting 
stuck in UNASSIGNED state for a few minutes before they start or fail to 
launch and then start after a few minutes of timeout.

The jobtracker is running at around 20% CPU with 1GB out of 4GB XMX used.

This causes jobs to take longer to start and therefore finish.

Is there something we can do to debug and fix this issue?


-- 
Regards,
Premal Shah.



Not able to ping AWS host

2015-03-09 Thread Krish Donald
Hi,

I am trying to setup Hadoop cluster on AWS .
After creating an instance, I got the public ip and dns.
But I tried to ping it from my windows machine I am not able to ping it.

I am not able to logon to machine using putty .
It is saying Network timed out.

Security group in the AWS cluster has open all TCP, UDP, ICMP and SSH also.

Please let me know if anybody ahs any idea.

Thanks
Krish


Re: How reduce tasks know which partition they should read?

2015-03-09 Thread Vinod Kumar Vavilapalli

The reducers(Fetcher.java) simply ask the Shuffle Service (ShuffleHandler.java) 
to give them output corresponding to a specific map. The partitioning detail is 
hidden from the reducers.

Thanks,
+Vinod

On Mar 9, 2015, at 7:56 AM, xeonmailinglist-gmail  
wrote:

> Hi,
> 
> I am looking to the Yarn mapreduce internals to try to understand how reduce 
> tasks know which partition of the map output they should read. Even, when 
> they re-execute after a crash?
> 
> I am also looking to the mapreduce source code. Is there any class that I 
> should look to try to understand this question?
> 
> Any help?
> 
> Thanks
> 
> 
> -- 
> --
> 



Re: Not able to ping AWS host

2015-03-09 Thread max scalf
when you say the security group has all open ports, is that open to public
(0.0.0.0) or to your specific IP(if so is ur ip correct)?

also are the instance inside of a VPC ??

On Mon, Mar 9, 2015 at 5:05 PM, Krish Donald  wrote:

> Hi,
>
> I am trying to setup Hadoop cluster on AWS .
> After creating an instance, I got the public ip and dns.
> But I tried to ping it from my windows machine I am not able to ping it.
>
> I am not able to logon to machine using putty .
> It is saying Network timed out.
>
> Security group in the AWS cluster has open all TCP, UDP, ICMP and SSH also.
>
> Please let me know if anybody ahs any idea.
>
> Thanks
> Krish
>


Re: Not able to ping AWS host

2015-03-09 Thread Krish Donald
Yes security group has all open ports to 0.0.0.0 and yes cluster is under
VPC

On Mon, Mar 9, 2015 at 5:15 PM, max scalf  wrote:

> when you say the security group has all open ports, is that open to public
> (0.0.0.0) or to your specific IP(if so is ur ip correct)?
>
> also are the instance inside of a VPC ??
>
> On Mon, Mar 9, 2015 at 5:05 PM, Krish Donald  wrote:
>
>> Hi,
>>
>> I am trying to setup Hadoop cluster on AWS .
>> After creating an instance, I got the public ip and dns.
>> But I tried to ping it from my windows machine I am not able to ping it.
>>
>> I am not able to logon to machine using putty .
>> It is saying Network timed out.
>>
>> Security group in the AWS cluster has open all TCP, UDP, ICMP and SSH
>> also.
>>
>> Please let me know if anybody ahs any idea.
>>
>> Thanks
>> Krish
>>
>
>