Re: Hadoop Multithread MapReduce

2012-07-25 Thread liu zg
hi, ken
what do you mean multiple threads?
I think that you can use multiple processes in map task by configuring the
property of * mapred.tasktracker.map.tasks.maximum* in mapred-site.xml.

Gump


On Thu, Jul 26, 2012 at 10:56 AM, kenyh  wrote:

>
> Does anyone know about the feature about using multiple thread in map task
> or
> reduce task?
> Is it a good way to use multithread in map task?
> --
> View this message in context:
> http://old.nabble.com/Hadoop-Multithread-MapReduce-tp34213534p34213534.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Re: Hadoop Multithread MapReduce

2012-07-25 Thread Harsh J
Hi,

We do have a Multithreaded Mapper implementation available for use.
Check out: 
http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html

On Thu, Jul 26, 2012 at 8:26 AM, kenyh  wrote:
>
> Does anyone know about the feature about using multiple thread in map task or
> reduce task?
> Is it a good way to use multithread in map task?
> --
> View this message in context: 
> http://old.nabble.com/Hadoop-Multithread-MapReduce-tp34213534p34213534.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>



-- 
Harsh J


Hadoop Multithread MapReduce

2012-07-25 Thread kenyh

Does anyone know about the feature about using multiple thread in map task or
reduce task?
Is it a good way to use multithread in map task? 
-- 
View this message in context: 
http://old.nabble.com/Hadoop-Multithread-MapReduce-tp34213534p34213534.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Re: HDFS block physical location

2012-07-25 Thread Todd Lipcon
Hi JS,

You may also be interested in the following JIRA which proposes an API
for block->disk mapping information:
https://issues.apache.org/jira/browse/HDFS-3672

There has been some discussion about potential use cases on this JIRA.
If you can describe what your use case is for this information on the
JIRA, we'd really appreciate the input.

-Todd

On Wed, Jul 25, 2012 at 3:20 PM, Chen He  wrote:
> For block to filename mapping, you can get from my previous answer.
>
> For block to harddisk mapping, you may need to traverse all the directory
> that used for HDFS, I am sure your OS has the information about which hard
> drive is mounted to which directory.
>
> with these two types of information, you can write a small Perl or Python
> script to get what you want.
>
> Or
>
> Take look of the namenode.java and see where and how it saves the table of
> block information.
>
> Please correct me if there is any mistake.
>
> Chen
>
>
> On Wed, Jul 25, 2012 at 6:10 PM, <20seco...@web.de> wrote:
>
>>
>> Thanks,
>> but that just gives me the hostnames or am I overlooking something?
>> I actually need the filename/harddisk on the node.
>>
>> JS
>>
>> Gesendet: Mittwoch, 25. Juli 2012 um 23:33 Uhr
>> Von: "Chen He" 
>> An: common-user@hadoop.apache.org
>> Betreff: Re: HDFS block physical location
>> >nohup hadoop fsck / -files -blocks -locations
>> >cat nohup.out | grep [your block name]
>>
>> Hope this helps.
>>
>> On Wed, Jul 25, 2012 at 5:17 PM, <20seco...@web.de> wrote:
>>
>> > Hi,
>> >
>> > just a short question. Is there any way to figure out the physical
>> storage
>> > location of a given block?
>> > I don't mean just a list of hostnames (which I know how to obtain), but
>> > actually the file where it is being stored in.
>> > We use several hard disks for hdfs data on each node, and I would need to
>> > know which block ends up on which harddisk.
>> >
>> > Thanks!
>> > JS
>> >
>> >
>>
>>
>>
>>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Re: HDFS block physical location

2012-07-25 Thread Chen He
For block to filename mapping, you can get from my previous answer.

For block to harddisk mapping, you may need to traverse all the directory
that used for HDFS, I am sure your OS has the information about which hard
drive is mounted to which directory.

with these two types of information, you can write a small Perl or Python
script to get what you want.

Or

Take look of the namenode.java and see where and how it saves the table of
block information.

Please correct me if there is any mistake.

Chen


On Wed, Jul 25, 2012 at 6:10 PM, <20seco...@web.de> wrote:

>
> Thanks,
> but that just gives me the hostnames or am I overlooking something?
> I actually need the filename/harddisk on the node.
>
> JS
>
> Gesendet: Mittwoch, 25. Juli 2012 um 23:33 Uhr
> Von: "Chen He" 
> An: common-user@hadoop.apache.org
> Betreff: Re: HDFS block physical location
> >nohup hadoop fsck / -files -blocks -locations
> >cat nohup.out | grep [your block name]
>
> Hope this helps.
>
> On Wed, Jul 25, 2012 at 5:17 PM, <20seco...@web.de> wrote:
>
> > Hi,
> >
> > just a short question. Is there any way to figure out the physical
> storage
> > location of a given block?
> > I don't mean just a list of hostnames (which I know how to obtain), but
> > actually the file where it is being stored in.
> > We use several hard disks for hdfs data on each node, and I would need to
> > know which block ends up on which harddisk.
> >
> > Thanks!
> > JS
> >
> >
>
>
>
>


Re: Simple hadoop processes/testing on windows machine

2012-07-25 Thread Kai Voigt
I suggest using a virtual machine with all required services installed and 
configured.

Cloudera offers a distribution as a VM, at 
https://ccp.cloudera.com/display/SUPPORT/CDH+Downloads#CDHDownloads-CDH4PackagesandDownloads

So all you need is installing VMware player on your Windows box and deploy that 
VM.

Kai

Am 25.07.2012 um 22:49 schrieb "Brown, Berlin [GCG-PFS]" 
:

> Is there a tutorial out there or quick startup that would allow me to:
> 
> 
> 
> 1.   Start my hadoop nodes by double clicking on a node, possibly a
> single node
> 
> 2.   Installing my Task
> 
> 3.   And then my client connects to those nodes.
> 
> 
> 
> I want to avoid having to install a hadoop windows service or any such
> installs.  Actually, I don't want to install anything for hadoop to run,
> I want to be able to unarchive the software and just double-click and
> run.  I also want to avoid additional hadoop windows registry or env
> params?
> 
> 
> 
> -
> 
> 
> 

-- 
Kai Voigt
k...@123.org






Aw: Re: HDFS block physical location

2012-07-25 Thread 20seconds

Thanks,
but that just gives me the hostnames or am I overlooking something?
I actually need the filename/harddisk on the node.

JS

Gesendet: Mittwoch, 25. Juli 2012 um 23:33 Uhr
Von: "Chen He" 
An: common-user@hadoop.apache.org
Betreff: Re: HDFS block physical location
>nohup hadoop fsck / -files -blocks -locations
>cat nohup.out | grep [your block name]

Hope this helps.

On Wed, Jul 25, 2012 at 5:17 PM, <20seco...@web.de> wrote:

> Hi,
>
> just a short question. Is there any way to figure out the physical storage
> location of a given block?
> I don't mean just a list of hostnames (which I know how to obtain), but
> actually the file where it is being stored in.
> We use several hard disks for hdfs data on each node, and I would need to
> know which block ends up on which harddisk.
>
> Thanks!
> JS
>
>





Re: HDFS block physical location

2012-07-25 Thread Chen He
>nohup hadoop fsck / -files -blocks -locations
>cat nohup.out | grep [your block name]

Hope this helps.

On Wed, Jul 25, 2012 at 5:17 PM, <20seco...@web.de> wrote:

> Hi,
>
> just a short question. Is there any way to figure out the physical storage
> location of a given block?
> I don't mean just a list of hostnames (which I know how to obtain), but
> actually the file where it is being stored in.
> We use several hard disks for hdfs data on each node, and I would need to
> know which block ends up on which harddisk.
>
> Thanks!
> JS
>
>


HDFS block physical location

2012-07-25 Thread 20seconds
Hi,

just a short question. Is there any way to figure out the physical storage 
location of a given block?
I don't mean just a list of hostnames (which I know how to obtain), but 
actually the file where it is being stored in.
We use several hard disks for hdfs data on each node, and I would need to know 
which block ends up on which harddisk.

Thanks!
JS



Simple hadoop processes/testing on windows machine

2012-07-25 Thread Brown, Berlin [GCG-PFS]
Is there a tutorial out there or quick startup that would allow me to:

 

1.   Start my hadoop nodes by double clicking on a node, possibly a
single node

2.   Installing my Task

3.   And then my client connects to those nodes.

 

I want to avoid having to install a hadoop windows service or any such
installs.  Actually, I don't want to install anything for hadoop to run,
I want to be able to unarchive the software and just double-click and
run.  I also want to avoid additional hadoop windows registry or env
params?

 

-

 



Re: Regarding DataJoin contrib jar for 1.0.3

2012-07-25 Thread Edward Capriolo
DataJoin is an example. Most people doing joins use Hive or Pig rather
then code them up themselves.


On Tue, Jul 24, 2012 at 5:19 PM, Abhinav M Kulkarni
 wrote:
> Hi,
>
> Do we not have any info on this? Join must be such a common scenario for
> most of the people out on this list.
>
> Thanks.
>
> On 07/22/2012 10:22 PM, Abhinav M Kulkarni wrote:
>>
>> Hi,
>>
>> I was planning to use DataJoin jar (located in
>> $HADOOP_INSTALL/contrib/datajoin) for reduce-side join (version 1.0.3).
>>
>> It looks like DataJoinMapperBase implements Mapper interface (according to
>> old API) and not extends it (according to new API). This is a problem
>> because I cannot write Map classes that extend DataJoinMapperBase.
>>
>> Do we have newer version of data join jar?
>>
>> Thanks.
>
>


Re: Using REST to get ApplicationMaster info (Issue solved)

2012-07-25 Thread Robert Evans
Hmm, that is very odd.  It only checks the user if security is enabled to
warn the user about potentially accessing something unsafe.  I am not sure
why that would cause an issue.

--Bobby Evans

On 7/9/12 6:07 AM, "Prajakta Kalmegh"  wrote:

>Hi Robert
>
>I figured out the problem just now. To avoid the below error, I had to set
>the 'hadoop.http.staticuser.user' property in core-site.xml (defaults to
>dr.who). I can now get runtime data from AppMaster using *curl* as well as
>in GUI.
>
>I wonder if we have to set this property even when we are not specifying
>the yarn web-proxy address (when it runs as part of RM by default) as
>well.
>If yes, was it documented somewhere which I failed to see? :(
>
>Anyways, thanks for your response so far.
>
>Regards,
>Prajakta
>
>
>
>On Mon, Jul 9, 2012 at 3:29 PM, Prajakta Kalmegh 
>wrote:
>
>> Hi Robert
>>
>> I started the proxyserver explicitly by specifying a value for the
>> yarn.web-proxy.address in yarn-site.xml. The proxyserver did start and I
>> tried getting the JSON response using the following command :
>>
>> curl --compressed -H "Accept: application/json" -X GET "
>> 
>>http://localhost:8090/proxy/application_1341823967331_0001/ws/v1/mapreduc
>>e/jobs/job_1341823967331_0001
>> "
>>
>> However, it refused connection and below is the excerpt from the
>> Proxyserver logs:
>> -
>> 2012-07-09 14:26:40,402 INFO org.mortbay.log: Extract
>> 
>>jar:file:/home/prajakta/Projects/IRL/hadoop-common/hadoop-dist/target/had
>>oop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-yarn-common-3.0.0-SNAPSH
>>OT.jar!/webapps/proxy
>> to /tmp/Jetty_localhost_8090_proxy.ak3o30/webapp
>> 2012-07-09 14:26:40,992 INFO org.mortbay.log: Started
>> SelectChannelConnector@localhost:8090
>> 2012-07-09 14:26:40,993 INFO
>> org.apache.hadoop.yarn.service.AbstractService:
>> Service:org.apache.hadoop.yarn.server.webproxy.WebAppProxy is started.
>> 2012-07-09 14:26:40,993 INFO
>> org.apache.hadoop.yarn.service.AbstractService:
>> Service:org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer is
>>started.
>> 2012-07-09 14:33:26,039 INFO
>> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is
>> accessing unchecked
>> http://prajakta:44314/ws/v1/mapreduce/jobs/job_1341823967331_0001 which
>> is the app master GUI of application_1341823967331_0001 owned by
>>prajakta
>> 2012-07-09 14:33:29,277 INFO
>> org.apache.commons.httpclient.HttpMethodDirector: I/O exception
>> (org.apache.commons.httpclient.NoHttpResponseException) caught when
>> processing request: The server prajakta failed to respond
>> 2012-07-09 14:33:29,277 INFO
>> org.apache.commons.httpclient.HttpMethodDirector: Retrying request
>> 2012-07-09 14:33:29,284 WARN org.mortbay.log:
>> 
>>/proxy/application_1341823967331_0001/ws/v1/mapreduce/jobs/job_1341823967
>>331_0001:
>> java.net.SocketException: Connection reset
>> 2012-07-09 14:37:33,834 INFO
>> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is
>> accessing unchecked
>> 
>>http://prajakta:19888/jobhistory/job/job_1341823967331_0001/jobhistory/jo
>>b/job_1341823967331_0001which is the app master GUI of
>>application_1341823967331_0001 owned by
>> prajakta
>> ---
>>
>> I am not sure why http request object is setting my remoteUser to
>>dr.who.
>> :(
>>
>> I gather from 
>>that
>> this warning is posted only in case where security is disabled. I assume
>> that the proxy server is not disabled if security is disabled.
>>
>> Any idea what could be the reason for this I/O exception? Am I missing
>> setting any property for proper access. Please let me know.
>>
>> Regards,
>> Prajakta
>>
>>
>>
>>
>>
>>
>> On Fri, Jul 6, 2012 at 10:59 PM, Prajakta Kalmegh
>>wrote:
>>
>>> I am using hadoop trunk (forked from github). It supports RESTful APIs
>>>as
>>> I am able to retrieve JSON objects for RM (cluster/nodes info)+
>>> Historyserver. The only issue is with AppMaster REST API.
>>>
>>> Regards,
>>> Prajakta
>>>
>>>
>>>
>>> On Fri, Jul 6, 2012 at 10:55 PM, Robert Evans
>>>wrote:
>>>
 What version of hadoop are you using?  It could be that the version
you
 have does not have the RESTful APIs in it yet, and the proxy is
working
 just fine.

 --Bobby Evans

 On 7/6/12 12:06 PM, "Prajakta Kalmegh"  wrote:

 >Robert , Thanks for the response. If I do not provide any explicit
 >configuration for the proxy server, do I still need to start it using
 the
 >'yarn start proxy server'? I am currently not doing it.
 >
 >Also, I am able to access the html page for proxy using the
 > URL. (Note this
 url
 >does not have the '/ws/v1/ part in it. I get the html response when I
 >query
 >for this URL in runtime.
 >
 >So I assume the proxy server must be starting fine since I am able to
 >access this URL. I will try logging more details tomorrow from my

Re: (Repost) Using REST to get ApplicationMaster info

2012-07-25 Thread Robert Evans
I am sorry it has taken me so long to respond.  Work has been crazy :).

I really am at a loss right now why you are getting the connection refused
error. The error is happening between the RM and the AM.  The Dr who is
something you can ignore.  It is the default name that is given to a web
user when security is disabled.  You probably want to check the logs for
the AM to see if there is anything in there, but beyond that I am at a
loss.

Sorry,

Bobby Evans

On 7/9/12 4:59 AM, "Prajakta Kalmegh"  wrote:

>Hi Robert
>
>I started the proxyserver explicitly by specifying a value for the
>yarn.web-proxy.address in yarn-site.xml. The proxyserver did start and I
>tried getting the JSON response using the following command :
>
>curl --compressed -H "Accept: application/json" -X GET "
>http://localhost:8090/proxy/application_1341823967331_0001/ws/v1/mapreduce
>/jobs/job_1341823967331_0001
>"
>
>However, it refused connection and below is the excerpt from the
>Proxyserver logs:
>-
>2012-07-09 14:26:40,402 INFO org.mortbay.log: Extract
>jar:file:/home/prajakta/Projects/IRL/hadoop-common/hadoop-dist/target/hado
>op-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-yarn-common-3.0.0-SNAPSHOT
>.jar!/webapps/proxy
>to /tmp/Jetty_localhost_8090_proxy.ak3o30/webapp
>2012-07-09 14:26:40,992 INFO org.mortbay.log: Started
>SelectChannelConnector@localhost:8090
>2012-07-09 14:26:40,993 INFO
>org.apache.hadoop.yarn.service.AbstractService:
>Service:org.apache.hadoop.yarn.server.webproxy.WebAppProxy is started.
>2012-07-09 14:26:40,993 INFO
>org.apache.hadoop.yarn.service.AbstractService:
>Service:org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer is
>started.
>2012-07-09 14:33:26,039 INFO
>org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is
>accessing unchecked
>http://prajakta:44314/ws/v1/mapreduce/jobs/job_1341823967331_0001 which is
>the app master GUI of application_1341823967331_0001 owned by prajakta
>2012-07-09 14:33:29,277 INFO
>org.apache.commons.httpclient.HttpMethodDirector: I/O exception
>(org.apache.commons.httpclient.NoHttpResponseException) caught when
>processing request: The server prajakta failed to respond
>2012-07-09 14:33:29,277 INFO
>org.apache.commons.httpclient.HttpMethodDirector: Retrying request
>2012-07-09 14:33:29,284 WARN org.mortbay.log:
>/proxy/application_1341823967331_0001/ws/v1/mapreduce/jobs/job_13418239673
>31_0001:
>java.net.SocketException: Connection reset
>2012-07-09 14:37:33,834 INFO
>org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is
>accessing unchecked
>http://prajakta:19888/jobhistory/job/job_1341823967331_0001/jobhistory/job
>/job_1341823967331_0001which
>is the app master GUI of application_1341823967331_0001 owned by
>prajakta
>---
>
>I am not sure why http request object is setting my remoteUser to dr.who.
>:(
>
>I gather from  that
>this warning is posted only in case where security is disabled. I assume
>that the proxy server is not disabled if security is disabled.
>
>Any idea what could be the reason for this I/O exception? Am I missing
>setting any property for proper access. Please let me know.
>
>Regards,
>Prajakta
>
>
>
>
>
>
>On Fri, Jul 6, 2012 at 10:59 PM, Prajakta Kalmegh
>wrote:
>
>> I am using hadoop trunk (forked from github). It supports RESTful APIs
>>as
>> I am able to retrieve JSON objects for RM (cluster/nodes info)+
>> Historyserver. The only issue is with AppMaster REST API.
>>
>> Regards,
>> Prajakta
>>
>>
>>
>> On Fri, Jul 6, 2012 at 10:55 PM, Robert Evans 
>>wrote:
>>
>>> What version of hadoop are you using?  It could be that the version you
>>> have does not have the RESTful APIs in it yet, and the proxy is working
>>> just fine.
>>>
>>> --Bobby Evans
>>>
>>> On 7/6/12 12:06 PM, "Prajakta Kalmegh"  wrote:
>>>
>>> >Robert , Thanks for the response. If I do not provide any explicit
>>> >configuration for the proxy server, do I still need to start it using
>>>the
>>> >'yarn start proxy server'? I am currently not doing it.
>>> >
>>> >Also, I am able to access the html page for proxy using the
>>> > URL. (Note this
>>>url
>>> >does not have the '/ws/v1/ part in it. I get the html response when I
>>> >query
>>> >for this URL in runtime.
>>> >
>>> >So I assume the proxy server must be starting fine since I am able to
>>> >access this URL. I will try logging more details tomorrow from my
>>>office
>>> >machine and will let you know the result.
>>> >
>>> >Regards,
>>> >Prajakta
>>> >
>>> >
>>> >
>>> >On Fri, Jul 6, 2012 at 10:22 PM, Robert Evans 
>>> wrote:
>>> >
>>> >> Sorry I did not respond sooner.  The default behavior is to have the
>>> >>proxy
>>> >> server run as part of the RM.  I am not really sure why it is not
>>>doing
>>> >> this in your case.  If you set the config yourself to be a URI that
>>>is
>>> >> different from that of the RM then you need to launch a standalone
>>> proxy
>>>

Re: Problem setting up Hadoop security with active directory using one-way cross-realm configuration

2012-07-25 Thread Mapred Learn
You have to run netdom trust command to set up trust realm on your windows
AD after ksetup command:

netdom trust HADOOP.REALM /Domain:DOMAIN.REALM /add /realm
/passwordT:

and then ktpass command:

ktpass /MITRealmName HADOOP.REALM /TrustEncryp RC4

I think the second ksetup command you ran is not needed.

Only first one and the two above are usually sufficient.

Let me know how does it go after these 2 commands.



On Wed, Jul 25, 2012 at 9:25 AM, Ivan Frain  wrote:

> In AD:
> - I have created a one way incoming trust using the GUI (I guess it is the
> equivalent of the "netdom trust").
> - ksetup /addkdc HADOOP.REALM mitkdc.hadoop.realm
> - ksetup /SetEncTypeAttr HADOOP.REALM RC4-HMAC-MD5
>
> What do you think ?
>
>
>
>
> 2012/7/25 Mapred Learn 
>
> > Krb5 looks good.
> > Can you also share commands you ran in your Windows AD ?
> >
> > Sent from my iPhone
> >
> > On Jul 25, 2012, at 8:27 AM, Ivan Frain  wrote:
> >
> > > Thanks for your answer.
> > >
> > > I think I already did what you propose. Some comments in the remaining.
> > >
> > >
> > > 2012/7/25 Mapred Learn 
> > >
> > >> You need to set up a local realm on your KDC ( linux) and run commands
> > on
> > >> windows AD to add this realm as a trust realm on your AD realm.
> > >>
> > >
> > > I set up a KDC on the linux machine  and configure a one-way incoming
> > trust
> > > on AD to be trusted by the local KDC. I set the enc type as well on
> AD. I
> > > also create the appropriate remote TGT on the local KDC:
> > > krbtgt/HADOOP.REALM@DOMAIN.REALM with the same encoding type
> > >
> > >
> > >>
> > >> After this you need to modify your /etc/krb5.conf to include this
> local
> > >> realm as trust realm to your AD realm.
> > >>
> > >
> > > Here is the /etc/krb5.conf located in my local kdc on
> mitkdc.hadoop.realm
> > > machine. May be something is wrong there:
> > >
> > > [libdefaults]
> > >default_realm = HADOOP.REALM
> > > default_tkt_enctypes = arcfour-hmac-md5
> > > default_tgs_enctypes = arcfour-hmac-md5
> > >
> > > [realms]
> > >HADOOP.REALM = {
> > >  kdc = mitkdc.hadoop.realm
> > >admin_server = mitkdc.hadoop.realm
> > > default_domain = hadoop.realm
> > >}
> > > DOMAIN.REALM = {
> > > kdc = ad.domain.realm
> > > admin_server = ad.domain.realm
> > > default_domain = domain.realm
> > > }
> > >
> > > [domain_realm]
> > > .hadoop.realm = HADOOP.REALM
> > > hadoop.realm = HADOOP.REALM
> > > .domain.realm = DOMAIN.REALM
> > > domain.realm = DOMAIN.REALM
> > >
> > >
> > >
> > >>
> > >> And then you should be all set.
> > >>
> > >>
> > > I was hoping so but it is not ... yet ... the case
> > >
> > >
> > >
> > >> Sent from my iPhone
> > >>
> > >> On Jul 25, 2012, at 2:29 AM, Ivan Frain  wrote:
> > >>
> > >>> *Hi all,*
> > >>> *
> > >>> *
> > >>> *I am trying to setup a one-way cross realm trust between a MIT KDC
> and
> > >> an
> > >>> active directory server and up to now I did not success.*
> > >>> *I hope someone in this list will be able to help me.*
> > >>> *
> > >>> *
> > >>> *My config is as follows:*
> > >>> *  - hadoop version: 0.23.1 with security enable (kerberos).*
> > >>> *  - hadoop realm (mitkdc): HADOOP.REALM*
> > >>> *  - 1 linux node (mitkdc.hadoop.realm - 192.168.198.254) running :
> > hdfs
> > >>> namenode, hdfs datanode, mit kdc*
> > >>> *  - 1 windows node (ad.domain.realm - 192.168.198.253) running:
> active
> > >>> directory 2003*
> > >>> *  - AD realm: DOMAIN.REALM*
> > >>> *
> > >>> *
> > >>> *Everything works well with kerberos enabled if I only use the linux
> > >>> machine with users having principal in the mitkdc: ivan@HADOOP.REALM
> *
> > >>> *
> > >>> *
> > >>> *What I am trying to do is to use the user database in the Active
> > >> directory
> > >>> (users with principals like ivan@DOMAIN.REALM)*
> > >>> *
> > >>> *
> > >>> *To do that, I setup a one-way cross realm as explained here:
> > >>>
> > >>
> >
> https://ccp.cloudera.com/display/CDH4DOC/Integrating+Hadoop+Security+with+Active+Directory
> > >>> *
> > >>> *
> > >>> *
> > >>> *From the linux machine I can authenticate against an active
> directory
> > >> user
> > >>> with the kinit command but when I perform a query using the hadoop
> > >> command
> > >>> I have the following error message:*
> > >>> -
> > >>> hdfs@mitkdc:~$ kinit ivan@DOMAIN.REALM
> > >>> Password for ivan@DOMAIN.REALM:
> > >>>
> > >>> hdfs@mitkdc:~$ klist -e
> > >>> Ticket cache: FILE:/tmp/krb5cc_10003
> > >>> Default principal: ivan@DOMAIN.REALM
> > >>>
> > >>> Valid startingExpires   Service principal
> > >>> 25/07/2012 11:00  25/07/2012 20:59  krbtgt/DOMAIN.REALM@DOMAIN.REALM
> > >>> renew until 26/07/2012 11:00, Etype (skey, tkt): arcfour-hmac,
> > >> arcfour-hmac
> > >>>
> > >>> hdfs@mitkdc:~$ hadoop/bin/hadoop fs -ls /user
> > >>> 12/07/25 11:00:50 ERROR security.UserGroupInformation:
> > >>> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> > >>> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused

Re: Can I change "hadoo.tmp.dir" for each jon run without formatting

2012-07-25 Thread Alok Kumar
Hi Abhay,

On Wed, Jul 25, 2012 at 10:44 PM, Abhay Ratnaparkhi
 wrote:
> "hadoop.tmp.dir" points to the directory on local disk to store
> intermediate task related data.
>
> It's currently mounted to "/tmp/hadoop" for me. Some of my jobs are running
> and Filesystem on which '/tmp' is mounted is getting full.
> Is it possible to change "hadoop.tmp.dir" parameter before submitting a new
> job?

You can override "hadoop.tmp.dir" everytime before submitting your Job.
I tried like this :

Configuration configuration = new Configuration();
config.set("hadoop.tmp.dir", "/home/user/some-other-path");
Job job = new Job(config, "Job1");

It produced same result (I didn't format anything)

Thanks
-- 
Alok


Can I change "hadoo.tmp.dir" for each jon run without formatting

2012-07-25 Thread Abhay Ratnaparkhi
"hadoop.tmp.dir" points to the directory on local disk to store
intermediate task related data.

It's currently mounted to "/tmp/hadoop" for me. Some of my jobs are running
and Filesystem on which '/tmp' is mounted is getting full.
Is it possible to change "hadoop.tmp.dir" parameter before submitting a new
job?

~Abhay


Re: HBase mulit-user security

2012-07-25 Thread Alejandro Abdelnur
Tony,

Sorry, missed this email earlier. This seems more appropriate for the Hbase
aliases.

Thx.

On Wed, Jul 11, 2012 at 8:41 AM, Tony Dean  wrote:

> Hi,
>
> Looking at this further, it appears that when HBaseRPC is creating a proxy
> (e.g., SecureRpcEngine), it injects the current user:
> User.getCurrent() which by default is the cached Kerberos TGT (kinit'ed
> user - using the "hadoop-user-kerberos" JAAS context).
>
> Since the server proxy always uses User.getCurrent(), how can an
> application inject the user it wants to use for authorization checks on the
> peer (region server)?
>
> And since SecureHadoopUser is a static class, how can you have more than 1
> active user in the same application?
>
> What you have works for a single user application like the hbase shell,
> but what about a multi-user application?
>
> Am I missing something?
>
> Thanks!
>
> -Tony
>
> -Original Message-
> From: Alejandro Abdelnur [mailto:t...@cloudera.com]
> Sent: Monday, July 02, 2012 11:40 AM
> To: common-user@hadoop.apache.org
> Subject: Re: hadoop security API (repost)
>
> Tony,
>
> If you are doing a server app that interacts with the cluster on behalf of
> different users (like Ooize, as you mentioned in your email), then you
> should use the proxyuser capabilities of Hadoop.
>
> * Configure user MYSERVERUSER as proxyuser in Hadoop core-site.xml (this
> requires 2 properties settings, HOSTS and GROUPS).
> * Run your server app as MYSERVERUSER and have a Kerberos principal
> MYSERVERUSER/MYSERVERHOST
> * Initialize your server app loading the MYSERVERUSER/MYSERVERHOST keytab
> * Use the UGI.doAs() to create JobClient/Filesystem instances using the
> user you want to do something on behalf
> * Keep in mind that all the users you need to do something on behalf
> should be valid Unix users in the cluster
> * If those users need direct access to the cluster, they'll have to be
> also defined in in the KDC user database.
>
> Hope this helps.
>
> Thx
>
> On Mon, Jul 2, 2012 at 6:22 AM, Tony Dean  wrote:
> > Yes, but this will not work in a multi-tenant environment.  I need to be
> able to create a Kerberos TGT per execution thread.
> >
> > I was hoping through JAAS that I could inject the name of the current
> principal and authenticate against it.  I'm sure there is a best practice
> for hadoop/hbase client API authentication, just not sure what it is.
> >
> > Thank you for your comment.  The solution may well be associated with
> the UserGroupInformation class.  Hopefully, other ideas will come from this
> thread.
> >
> > Thanks.
> >
> > -Tony
> >
> > -Original Message-
> > From: Ivan Frain [mailto:ivan.fr...@gmail.com]
> > Sent: Monday, July 02, 2012 8:14 AM
> > To: common-user@hadoop.apache.org
> > Subject: Re: hadoop security API (repost)
> >
> > Hi Tony,
> >
> > I am currently working on this to access HDFS securely and
> programmaticaly.
> > What I have found so far may help even if I am not 100% sure this is the
> right way to proceed.
> >
> > If you have already obtained a TGT from the kinit command, hadoop
> library will locate it "automatically" if the name of the ticket cache
> corresponds to default location. On Linux it is located
> /tmp/krb5cc_uid-number.
> >
> > For example, with my linux user hdfs, I get a TGT for hadoop user 'ivan'
> > meaning you can impersonate ivan from hdfs linux user:
> > --
> > hdfs@mitkdc:~$ klist
> > Ticket cache: FILE:/tmp/krb5cc_10003
> > Default principal: i...@hadoop.lan
> >
> > Valid startingExpires   Service principal
> > 02/07/2012 13:59  02/07/2012 23:59  krbtgt/hadoop@hadoop.lan renew
> > until 03/07/2012 13:59
> > ---
> >
> > Then, you just have to set the right security options in your hadoop
> client in java and the identity will be i...@hadoop.lan for our example.
> In my tests, I only use HDFS and here a snippet of code to have access to a
> secure hdfs cluster assuming the previous TGT (ivan's impersonation):
> >
> > 
> >  val conf: HdfsConfiguration = new HdfsConfiguration()
> >
> > conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
> > "kerberos")
> >
> > conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION,
> > "true")
> >  conf.set(DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY,
> > serverPrincipal)
> >
> >  UserGroupInformation.setConfiguration(conf)
> >
> >  val fs = FileSystem.get(new URI(hdfsUri), conf)
> > 
> >
> > Using this 'fs' is a handler to access hdfs securely as user 'ivan' even
> if ivan does not appear in the hadoop client code.
> >
> > Anyway, I also see two other options:
> >   * Setting the KRB5CCNAME environment variable to point to the right
> ticketCache file
> >   * Specifying the keytab file you want to use from the
> UserGroupInformation singleton API:
> > UserGroupInformation.loginUserFromKeyt

Re: Problem setting up Hadoop security with active directory using one-way cross-realm configuration

2012-07-25 Thread Ivan Frain
In AD:
- I have created a one way incoming trust using the GUI (I guess it is the
equivalent of the "netdom trust").
- ksetup /addkdc HADOOP.REALM mitkdc.hadoop.realm
- ksetup /SetEncTypeAttr HADOOP.REALM RC4-HMAC-MD5

What do you think ?




2012/7/25 Mapred Learn 

> Krb5 looks good.
> Can you also share commands you ran in your Windows AD ?
>
> Sent from my iPhone
>
> On Jul 25, 2012, at 8:27 AM, Ivan Frain  wrote:
>
> > Thanks for your answer.
> >
> > I think I already did what you propose. Some comments in the remaining.
> >
> >
> > 2012/7/25 Mapred Learn 
> >
> >> You need to set up a local realm on your KDC ( linux) and run commands
> on
> >> windows AD to add this realm as a trust realm on your AD realm.
> >>
> >
> > I set up a KDC on the linux machine  and configure a one-way incoming
> trust
> > on AD to be trusted by the local KDC. I set the enc type as well on AD. I
> > also create the appropriate remote TGT on the local KDC:
> > krbtgt/HADOOP.REALM@DOMAIN.REALM with the same encoding type
> >
> >
> >>
> >> After this you need to modify your /etc/krb5.conf to include this local
> >> realm as trust realm to your AD realm.
> >>
> >
> > Here is the /etc/krb5.conf located in my local kdc on mitkdc.hadoop.realm
> > machine. May be something is wrong there:
> >
> > [libdefaults]
> >default_realm = HADOOP.REALM
> > default_tkt_enctypes = arcfour-hmac-md5
> > default_tgs_enctypes = arcfour-hmac-md5
> >
> > [realms]
> >HADOOP.REALM = {
> >  kdc = mitkdc.hadoop.realm
> >admin_server = mitkdc.hadoop.realm
> > default_domain = hadoop.realm
> >}
> > DOMAIN.REALM = {
> > kdc = ad.domain.realm
> > admin_server = ad.domain.realm
> > default_domain = domain.realm
> > }
> >
> > [domain_realm]
> > .hadoop.realm = HADOOP.REALM
> > hadoop.realm = HADOOP.REALM
> > .domain.realm = DOMAIN.REALM
> > domain.realm = DOMAIN.REALM
> >
> >
> >
> >>
> >> And then you should be all set.
> >>
> >>
> > I was hoping so but it is not ... yet ... the case
> >
> >
> >
> >> Sent from my iPhone
> >>
> >> On Jul 25, 2012, at 2:29 AM, Ivan Frain  wrote:
> >>
> >>> *Hi all,*
> >>> *
> >>> *
> >>> *I am trying to setup a one-way cross realm trust between a MIT KDC and
> >> an
> >>> active directory server and up to now I did not success.*
> >>> *I hope someone in this list will be able to help me.*
> >>> *
> >>> *
> >>> *My config is as follows:*
> >>> *  - hadoop version: 0.23.1 with security enable (kerberos).*
> >>> *  - hadoop realm (mitkdc): HADOOP.REALM*
> >>> *  - 1 linux node (mitkdc.hadoop.realm - 192.168.198.254) running :
> hdfs
> >>> namenode, hdfs datanode, mit kdc*
> >>> *  - 1 windows node (ad.domain.realm - 192.168.198.253) running: active
> >>> directory 2003*
> >>> *  - AD realm: DOMAIN.REALM*
> >>> *
> >>> *
> >>> *Everything works well with kerberos enabled if I only use the linux
> >>> machine with users having principal in the mitkdc: ivan@HADOOP.REALM*
> >>> *
> >>> *
> >>> *What I am trying to do is to use the user database in the Active
> >> directory
> >>> (users with principals like ivan@DOMAIN.REALM)*
> >>> *
> >>> *
> >>> *To do that, I setup a one-way cross realm as explained here:
> >>>
> >>
> https://ccp.cloudera.com/display/CDH4DOC/Integrating+Hadoop+Security+with+Active+Directory
> >>> *
> >>> *
> >>> *
> >>> *From the linux machine I can authenticate against an active directory
> >> user
> >>> with the kinit command but when I perform a query using the hadoop
> >> command
> >>> I have the following error message:*
> >>> -
> >>> hdfs@mitkdc:~$ kinit ivan@DOMAIN.REALM
> >>> Password for ivan@DOMAIN.REALM:
> >>>
> >>> hdfs@mitkdc:~$ klist -e
> >>> Ticket cache: FILE:/tmp/krb5cc_10003
> >>> Default principal: ivan@DOMAIN.REALM
> >>>
> >>> Valid startingExpires   Service principal
> >>> 25/07/2012 11:00  25/07/2012 20:59  krbtgt/DOMAIN.REALM@DOMAIN.REALM
> >>> renew until 26/07/2012 11:00, Etype (skey, tkt): arcfour-hmac,
> >> arcfour-hmac
> >>>
> >>> hdfs@mitkdc:~$ hadoop/bin/hadoop fs -ls /user
> >>> 12/07/25 11:00:50 ERROR security.UserGroupInformation:
> >>> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> >>> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> >>> GSSException: No valid credentials provided (Mechanism level: Fail to
> >>> create credential. (63) - No service creds)]
> >>> 12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating logout
> >> for
> >>> ivan@DOMAIN.REALM
> >>> 12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating
> re-login
> >>> for ivan@DOMAIN.REALM
> >>> 12/07/25 11:00:53 ERROR security.UserGroupInformation:
> >>> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> >>> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> >>> GSSException: No valid credentials provided (Mechanism level: Fail to
> >>> create credential. (63) - No service creds)]
> >>> 12/07/25 11:00:53 WARN security.UserGroupInformation: Not attempting to
> >>

Re: Problem setting up Hadoop security with active directory using one-way cross-realm configuration

2012-07-25 Thread Mapred Learn
Krb5 looks good.
Can you also share commands you ran in your Windows AD ?

Sent from my iPhone

On Jul 25, 2012, at 8:27 AM, Ivan Frain  wrote:

> Thanks for your answer.
> 
> I think I already did what you propose. Some comments in the remaining.
> 
> 
> 2012/7/25 Mapred Learn 
> 
>> You need to set up a local realm on your KDC ( linux) and run commands on
>> windows AD to add this realm as a trust realm on your AD realm.
>> 
> 
> I set up a KDC on the linux machine  and configure a one-way incoming trust
> on AD to be trusted by the local KDC. I set the enc type as well on AD. I
> also create the appropriate remote TGT on the local KDC:
> krbtgt/HADOOP.REALM@DOMAIN.REALM with the same encoding type
> 
> 
>> 
>> After this you need to modify your /etc/krb5.conf to include this local
>> realm as trust realm to your AD realm.
>> 
> 
> Here is the /etc/krb5.conf located in my local kdc on mitkdc.hadoop.realm
> machine. May be something is wrong there:
> 
> [libdefaults]
>default_realm = HADOOP.REALM
> default_tkt_enctypes = arcfour-hmac-md5
> default_tgs_enctypes = arcfour-hmac-md5
> 
> [realms]
>HADOOP.REALM = {
>  kdc = mitkdc.hadoop.realm
>admin_server = mitkdc.hadoop.realm
> default_domain = hadoop.realm
>}
> DOMAIN.REALM = {
> kdc = ad.domain.realm
> admin_server = ad.domain.realm
> default_domain = domain.realm
> }
> 
> [domain_realm]
> .hadoop.realm = HADOOP.REALM
> hadoop.realm = HADOOP.REALM
> .domain.realm = DOMAIN.REALM
> domain.realm = DOMAIN.REALM
> 
> 
> 
>> 
>> And then you should be all set.
>> 
>> 
> I was hoping so but it is not ... yet ... the case
> 
> 
> 
>> Sent from my iPhone
>> 
>> On Jul 25, 2012, at 2:29 AM, Ivan Frain  wrote:
>> 
>>> *Hi all,*
>>> *
>>> *
>>> *I am trying to setup a one-way cross realm trust between a MIT KDC and
>> an
>>> active directory server and up to now I did not success.*
>>> *I hope someone in this list will be able to help me.*
>>> *
>>> *
>>> *My config is as follows:*
>>> *  - hadoop version: 0.23.1 with security enable (kerberos).*
>>> *  - hadoop realm (mitkdc): HADOOP.REALM*
>>> *  - 1 linux node (mitkdc.hadoop.realm - 192.168.198.254) running : hdfs
>>> namenode, hdfs datanode, mit kdc*
>>> *  - 1 windows node (ad.domain.realm - 192.168.198.253) running: active
>>> directory 2003*
>>> *  - AD realm: DOMAIN.REALM*
>>> *
>>> *
>>> *Everything works well with kerberos enabled if I only use the linux
>>> machine with users having principal in the mitkdc: ivan@HADOOP.REALM*
>>> *
>>> *
>>> *What I am trying to do is to use the user database in the Active
>> directory
>>> (users with principals like ivan@DOMAIN.REALM)*
>>> *
>>> *
>>> *To do that, I setup a one-way cross realm as explained here:
>>> 
>> https://ccp.cloudera.com/display/CDH4DOC/Integrating+Hadoop+Security+with+Active+Directory
>>> *
>>> *
>>> *
>>> *From the linux machine I can authenticate against an active directory
>> user
>>> with the kinit command but when I perform a query using the hadoop
>> command
>>> I have the following error message:*
>>> -
>>> hdfs@mitkdc:~$ kinit ivan@DOMAIN.REALM
>>> Password for ivan@DOMAIN.REALM:
>>> 
>>> hdfs@mitkdc:~$ klist -e
>>> Ticket cache: FILE:/tmp/krb5cc_10003
>>> Default principal: ivan@DOMAIN.REALM
>>> 
>>> Valid startingExpires   Service principal
>>> 25/07/2012 11:00  25/07/2012 20:59  krbtgt/DOMAIN.REALM@DOMAIN.REALM
>>> renew until 26/07/2012 11:00, Etype (skey, tkt): arcfour-hmac,
>> arcfour-hmac
>>> 
>>> hdfs@mitkdc:~$ hadoop/bin/hadoop fs -ls /user
>>> 12/07/25 11:00:50 ERROR security.UserGroupInformation:
>>> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
>>> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
>>> GSSException: No valid credentials provided (Mechanism level: Fail to
>>> create credential. (63) - No service creds)]
>>> 12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating logout
>> for
>>> ivan@DOMAIN.REALM
>>> 12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating re-login
>>> for ivan@DOMAIN.REALM
>>> 12/07/25 11:00:53 ERROR security.UserGroupInformation:
>>> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
>>> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
>>> GSSException: No valid credentials provided (Mechanism level: Fail to
>>> create credential. (63) - No service creds)]
>>> 12/07/25 11:00:53 WARN security.UserGroupInformation: Not attempting to
>>> re-login since the last re-login was attempted less than 600 seconds
>> before.
>>> 12/07/25 11:00:56 ERROR security.UserGroupInformation:
>>> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
>>> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
>>> GSSException: No valid credentials provided (Mechanism level: Fail to
>>> create credential. (63) - No service creds)]
>>> 12/07/25 11:00:56 WARN security.UserGroupInformation: Not attempting to
>>> re-login since the last re-login was

Re: Problem setting up Hadoop security with active directory using one-way cross-realm configuration

2012-07-25 Thread Ivan Frain
Thanks for your answer.

I think I already did what you propose. Some comments in the remaining.


2012/7/25 Mapred Learn 

> You need to set up a local realm on your KDC ( linux) and run commands on
> windows AD to add this realm as a trust realm on your AD realm.
>

I set up a KDC on the linux machine  and configure a one-way incoming trust
on AD to be trusted by the local KDC. I set the enc type as well on AD. I
also create the appropriate remote TGT on the local KDC:
krbtgt/HADOOP.REALM@DOMAIN.REALM with the same encoding type


>
> After this you need to modify your /etc/krb5.conf to include this local
> realm as trust realm to your AD realm.
>

Here is the /etc/krb5.conf located in my local kdc on mitkdc.hadoop.realm
machine. May be something is wrong there:

[libdefaults]
default_realm = HADOOP.REALM
default_tkt_enctypes = arcfour-hmac-md5
default_tgs_enctypes = arcfour-hmac-md5

[realms]
HADOOP.REALM = {
  kdc = mitkdc.hadoop.realm
admin_server = mitkdc.hadoop.realm
default_domain = hadoop.realm
}
 DOMAIN.REALM = {
kdc = ad.domain.realm
admin_server = ad.domain.realm
default_domain = domain.realm
}

[domain_realm]
.hadoop.realm = HADOOP.REALM
hadoop.realm = HADOOP.REALM
.domain.realm = DOMAIN.REALM
domain.realm = DOMAIN.REALM



>
> And then you should be all set.
>
>
I was hoping so but it is not ... yet ... the case



> Sent from my iPhone
>
> On Jul 25, 2012, at 2:29 AM, Ivan Frain  wrote:
>
> > *Hi all,*
> > *
> > *
> > *I am trying to setup a one-way cross realm trust between a MIT KDC and
> an
> > active directory server and up to now I did not success.*
> > *I hope someone in this list will be able to help me.*
> > *
> > *
> > *My config is as follows:*
> > *  - hadoop version: 0.23.1 with security enable (kerberos).*
> > *  - hadoop realm (mitkdc): HADOOP.REALM*
> > *  - 1 linux node (mitkdc.hadoop.realm - 192.168.198.254) running : hdfs
> > namenode, hdfs datanode, mit kdc*
> > *  - 1 windows node (ad.domain.realm - 192.168.198.253) running: active
> > directory 2003*
> > *  - AD realm: DOMAIN.REALM*
> > *
> > *
> > *Everything works well with kerberos enabled if I only use the linux
> > machine with users having principal in the mitkdc: ivan@HADOOP.REALM*
> > *
> > *
> > *What I am trying to do is to use the user database in the Active
> directory
> > (users with principals like ivan@DOMAIN.REALM)*
> > *
> > *
> > *To do that, I setup a one-way cross realm as explained here:
> >
> https://ccp.cloudera.com/display/CDH4DOC/Integrating+Hadoop+Security+with+Active+Directory
> > *
> > *
> > *
> > *From the linux machine I can authenticate against an active directory
> user
> > with the kinit command but when I perform a query using the hadoop
> command
> > I have the following error message:*
> > -
> > hdfs@mitkdc:~$ kinit ivan@DOMAIN.REALM
> > Password for ivan@DOMAIN.REALM:
> >
> > hdfs@mitkdc:~$ klist -e
> > Ticket cache: FILE:/tmp/krb5cc_10003
> > Default principal: ivan@DOMAIN.REALM
> >
> > Valid startingExpires   Service principal
> > 25/07/2012 11:00  25/07/2012 20:59  krbtgt/DOMAIN.REALM@DOMAIN.REALM
> > renew until 26/07/2012 11:00, Etype (skey, tkt): arcfour-hmac,
> arcfour-hmac
> >
> > hdfs@mitkdc:~$ hadoop/bin/hadoop fs -ls /user
> > 12/07/25 11:00:50 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> > cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> > GSSException: No valid credentials provided (Mechanism level: Fail to
> > create credential. (63) - No service creds)]
> > 12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating logout
> for
> > ivan@DOMAIN.REALM
> > 12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating re-login
> > for ivan@DOMAIN.REALM
> > 12/07/25 11:00:53 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> > cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> > GSSException: No valid credentials provided (Mechanism level: Fail to
> > create credential. (63) - No service creds)]
> > 12/07/25 11:00:53 WARN security.UserGroupInformation: Not attempting to
> > re-login since the last re-login was attempted less than 600 seconds
> before.
> > 12/07/25 11:00:56 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> > cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> > GSSException: No valid credentials provided (Mechanism level: Fail to
> > create credential. (63) - No service creds)]
> > 12/07/25 11:00:56 WARN security.UserGroupInformation: Not attempting to
> > re-login since the last re-login was attempted less than 600 seconds
> before.
> > 12/07/25 11:00:58 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> > cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> > GSSException: No v

Re: Problem setting up Hadoop security with active directory using one-way cross-realm configuration

2012-07-25 Thread Mapred Learn
You need to set up a local realm on your KDC ( linux) and run commands on 
windows AD to add this realm as a trust realm on your AD realm.

After this you need to modify your /etc/krb5.conf to include this local realm 
as trust realm to your AD realm.

And then you should be all set.

Sent from my iPhone

On Jul 25, 2012, at 2:29 AM, Ivan Frain  wrote:

> *Hi all,*
> *
> *
> *I am trying to setup a one-way cross realm trust between a MIT KDC and an
> active directory server and up to now I did not success.*
> *I hope someone in this list will be able to help me.*
> *
> *
> *My config is as follows:*
> *  - hadoop version: 0.23.1 with security enable (kerberos).*
> *  - hadoop realm (mitkdc): HADOOP.REALM*
> *  - 1 linux node (mitkdc.hadoop.realm - 192.168.198.254) running : hdfs
> namenode, hdfs datanode, mit kdc*
> *  - 1 windows node (ad.domain.realm - 192.168.198.253) running: active
> directory 2003*
> *  - AD realm: DOMAIN.REALM*
> *
> *
> *Everything works well with kerberos enabled if I only use the linux
> machine with users having principal in the mitkdc: ivan@HADOOP.REALM*
> *
> *
> *What I am trying to do is to use the user database in the Active directory
> (users with principals like ivan@DOMAIN.REALM)*
> *
> *
> *To do that, I setup a one-way cross realm as explained here:
> https://ccp.cloudera.com/display/CDH4DOC/Integrating+Hadoop+Security+with+Active+Directory
> *
> *
> *
> *From the linux machine I can authenticate against an active directory user
> with the kinit command but when I perform a query using the hadoop command
> I have the following error message:*
> -
> hdfs@mitkdc:~$ kinit ivan@DOMAIN.REALM
> Password for ivan@DOMAIN.REALM:
> 
> hdfs@mitkdc:~$ klist -e
> Ticket cache: FILE:/tmp/krb5cc_10003
> Default principal: ivan@DOMAIN.REALM
> 
> Valid startingExpires   Service principal
> 25/07/2012 11:00  25/07/2012 20:59  krbtgt/DOMAIN.REALM@DOMAIN.REALM
> renew until 26/07/2012 11:00, Etype (skey, tkt): arcfour-hmac, arcfour-hmac
> 
> hdfs@mitkdc:~$ hadoop/bin/hadoop fs -ls /user
> 12/07/25 11:00:50 ERROR security.UserGroupInformation:
> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Fail to
> create credential. (63) - No service creds)]
> 12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating logout for
> ivan@DOMAIN.REALM
> 12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating re-login
> for ivan@DOMAIN.REALM
> 12/07/25 11:00:53 ERROR security.UserGroupInformation:
> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Fail to
> create credential. (63) - No service creds)]
> 12/07/25 11:00:53 WARN security.UserGroupInformation: Not attempting to
> re-login since the last re-login was attempted less than 600 seconds before.
> 12/07/25 11:00:56 ERROR security.UserGroupInformation:
> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Fail to
> create credential. (63) - No service creds)]
> 12/07/25 11:00:56 WARN security.UserGroupInformation: Not attempting to
> re-login since the last re-login was attempted less than 600 seconds before.
> 12/07/25 11:00:58 ERROR security.UserGroupInformation:
> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Fail to
> create credential. (63) - No service creds)]
> 12/07/25 11:00:58 WARN security.UserGroupInformation: Not attempting to
> re-login since the last re-login was attempted less than 600 seconds before.
> 12/07/25 11:00:59 ERROR security.UserGroupInformation:
> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Fail to
> create credential. (63) - No service creds)]
> 12/07/25 11:00:59 WARN security.UserGroupInformation: Not attempting to
> re-login since the last re-login was attempted less than 600 seconds before.
> 12/07/25 11:01:02 ERROR security.UserGroupInformation:
> PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Fail to
> create credential. (63) - No service creds)]
> 12/07/25 11:01:02 WARN ipc.Client: Couldn't setup connection for
> ivan@DOMAIN.REALM to hdfs/mitkdc.hadoop.realm@HADOOP.REALM
> 12/07/25 11:01:02 ERROR security.UserGroupInformation:
> PriviledgedActionException as:ivan

Problem setting up Hadoop security with active directory using one-way cross-realm configuration

2012-07-25 Thread Ivan Frain
*Hi all,*
*
*
*I am trying to setup a one-way cross realm trust between a MIT KDC and an
active directory server and up to now I did not success.*
*I hope someone in this list will be able to help me.*
*
*
*My config is as follows:*
*  - hadoop version: 0.23.1 with security enable (kerberos).*
*  - hadoop realm (mitkdc): HADOOP.REALM*
*  - 1 linux node (mitkdc.hadoop.realm - 192.168.198.254) running : hdfs
namenode, hdfs datanode, mit kdc*
*  - 1 windows node (ad.domain.realm - 192.168.198.253) running: active
directory 2003*
*  - AD realm: DOMAIN.REALM*
*
*
*Everything works well with kerberos enabled if I only use the linux
machine with users having principal in the mitkdc: ivan@HADOOP.REALM*
*
*
*What I am trying to do is to use the user database in the Active directory
(users with principals like ivan@DOMAIN.REALM)*
*
*
*To do that, I setup a one-way cross realm as explained here:
https://ccp.cloudera.com/display/CDH4DOC/Integrating+Hadoop+Security+with+Active+Directory
*
*
*
*From the linux machine I can authenticate against an active directory user
with the kinit command but when I perform a query using the hadoop command
I have the following error message:*
-
hdfs@mitkdc:~$ kinit ivan@DOMAIN.REALM
Password for ivan@DOMAIN.REALM:

hdfs@mitkdc:~$ klist -e
Ticket cache: FILE:/tmp/krb5cc_10003
Default principal: ivan@DOMAIN.REALM

Valid startingExpires   Service principal
25/07/2012 11:00  25/07/2012 20:59  krbtgt/DOMAIN.REALM@DOMAIN.REALM
renew until 26/07/2012 11:00, Etype (skey, tkt): arcfour-hmac, arcfour-hmac

hdfs@mitkdc:~$ hadoop/bin/hadoop fs -ls /user
12/07/25 11:00:50 ERROR security.UserGroupInformation:
PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Fail to
create credential. (63) - No service creds)]
12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating logout for
ivan@DOMAIN.REALM
12/07/25 11:00:50 INFO security.UserGroupInformation: Initiating re-login
for ivan@DOMAIN.REALM
12/07/25 11:00:53 ERROR security.UserGroupInformation:
PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Fail to
create credential. (63) - No service creds)]
12/07/25 11:00:53 WARN security.UserGroupInformation: Not attempting to
re-login since the last re-login was attempted less than 600 seconds before.
12/07/25 11:00:56 ERROR security.UserGroupInformation:
PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Fail to
create credential. (63) - No service creds)]
12/07/25 11:00:56 WARN security.UserGroupInformation: Not attempting to
re-login since the last re-login was attempted less than 600 seconds before.
12/07/25 11:00:58 ERROR security.UserGroupInformation:
PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Fail to
create credential. (63) - No service creds)]
12/07/25 11:00:58 WARN security.UserGroupInformation: Not attempting to
re-login since the last re-login was attempted less than 600 seconds before.
12/07/25 11:00:59 ERROR security.UserGroupInformation:
PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Fail to
create credential. (63) - No service creds)]
12/07/25 11:00:59 WARN security.UserGroupInformation: Not attempting to
re-login since the last re-login was attempted less than 600 seconds before.
12/07/25 11:01:02 ERROR security.UserGroupInformation:
PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Fail to
create credential. (63) - No service creds)]
12/07/25 11:01:02 WARN ipc.Client: Couldn't setup connection for
ivan@DOMAIN.REALM to hdfs/mitkdc.hadoop.realm@HADOOP.REALM
12/07/25 11:01:02 ERROR security.UserGroupInformation:
PriviledgedActionException as:ivan@DOMAIN.REALM (auth:KERBEROS)
cause:java.io.IOException: Couldn't setup connection for
ivan@DOMAIN.REALMto hdfs/mitkdc.hadoop.realm@HADOOP.REALM
ls: Failed on local exception: java.io.IOException: Couldn't setup
connection for ivan@DOMAIN.REALM to hdfs/mitkdc.hadoop.realm@HADOOP.REALM;
Host Details : local host is: "mitkdc.hadoop.realm/192.168.198.254";
destination host is: ""mitkdc.hadoop.realm":8020;
-

*On the mitkdc server log I can see something like the following meaning
that encoded types are not supported: *

Re: Hadoop Example in java

2012-07-25 Thread Lance Norskog
http://lintool.github.com/MapReduceAlgorithms/index.html

The original book for a lot of really cool map-reduce algorithms.

After you "get" what these classes do, get Hive and Pig. They both
have an 'explain plan' command that shows you the chain of map-reduce
jobs needed for your high-level code. Really helpful.

On Tue, Jul 24, 2012 at 10:13 PM, Saravanan Nagarajan
 wrote:
> HI vikas,
>
> You can download example programes from facebook group link below:
> http://www.facebook.com/groups/416125741763625/
>
>
> It contain some ppt as well.
>
> Regards,
> Saravanan Nagarajan
>
> On Wed, Jul 25, 2012 at 10:17 AM, minumichael > wrote:
>
>>
>> Hi Vikas,
>>
>> You could also try out various examples like finding the maximum
>> temperature
>> from a given dataset
>>
>> 006701199091950051507004...999N9+1+999...
>> 004301199091950051512004...999N9+00221+999...
>> 004301199091950051518004...999N9-00111+999...
>> 004301265091949032412004...051N9+0+999...
>> 004301265091949032418004...051N9+00781+999...
>>
>> //Mapper for maximum temperature example
>>
>> import java.io.IOException;
>> import org.apache.hadoop.io.LongWritable;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapreduce.Mapper;
>>
>> public class MaxTemperatureMapper
>> extends Mapper {
>> private static final int MISSING = ;
>> @Override
>> public void map(LongWritable key, Text value, Context context)
>> throws IOException, InterruptedException {
>> }
>> }
>> String line = value.toString();
>> String year = line.substring(15, 19);
>> int airTemperature;
>> if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
>> airTemperature = Integer.parseInt(line.substring(88, 92));
>> } else {
>> airTemperature = Integer.parseInt(line.substring(87, 92));
>> }
>> String quality = line.substring(92, 93);
>> if (airTemperature != MISSING && quality.matches("[01459]")) {
>> context.write(new Text(year), new IntWritable(airTemperature));
>> }
>> }}
>> //Reducer for maximum temperature example
>> import java.io.IOException;
>> import org.apache.hadoop.io.IntWritable;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapreduce.Reducer;
>> public class MaxTemperatureReducer
>> extends Reducer {
>> @Override
>> public void reduce(Text key, Iterable values,
>> Context context)
>> throws IOException, InterruptedException {
>> }
>> }
>> int maxValue = Integer.MIN_VALUE;
>> for (IntWritable value : values) {
>> maxValue = Math.max(maxValue, value.get());
>> }
>> context.write(key, new IntWritable(maxValue));
>> }
>> }
>> //Application to find the maximum temperature in the weather dataset
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.IntWritable;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapreduce.Job;
>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>> public class MaxTemperature {
>> public static void main(String[] args) throws Exception {
>> if (args.length != 2) {
>> System.err.println("Usage: MaxTemperature  ");
>> System.exit(-1);
>> }
>> Job job = new Job();
>> job.setJarByClass(MaxTemperature.class);
>> job.setJobName("Max temperature");
>> FileInputFormat.addInputPath(job, new Path(args[0]));
>> FileOutputFormat.setOutputPath(job, new Path(args[1]));
>> job.setMapperClass(MaxTemperatureMapper.class);
>> job.setReducerClass(MaxTemperatureReducer.class);
>> job.setOutputKeyClass(Text.class);
>> job.setOutputValueClass(IntWritable.class);
>> }
>> }
>> System.exit(job.waitForCompletion(true) ? 0 : 1);
>> }
>> }
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Hadoop-Example-in-java-tp33341353p34208568.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>



-- 
Lance Norskog
goks...@gmail.com