Re: How does this work

2015-12-24 Thread Harsh J
Hue and Beeline access your warehouse data and metadata via the HiveServer2
APIs. The HiveServer2 service runs as the 'hive' user.

On Wed, Dec 23, 2015 at 9:42 PM Kumar Jayapal  wrote:

> Hi,
>
> My environment has Kerbros and Senry for authentication and authorisation.
>
> we have the following  permission on
>
> drwxrwx---   - hive hive  */user/hive/warehouse*
>
> Now When I login through Hue/Beeline how am able to acccess the data
> inside this directory.
>
> When I dont belong to hive group.
>
>
>
>
> Thanks
> Jay
>


Re: How does Kerberos work with Hadoop ?

2013-02-22 Thread rohit sarewar
Thanks Vinod for the wonderful pdf.
It explains how the security can be achieved via Kerberos.
Any other way to implement security in hadoop (without using kerberos)?


On Fri, Feb 22, 2013 at 2:39 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:


 You should read the hadoop security design doc which you can find at
 https://issues.apache.org/jira/browse/HADOOP-4487

 HTH,
 +Vinod Kumar Vavilapalli
 Hortonworks Inc.
 http://hortonworks.com/

 On Feb 21, 2013, at 11:02 AM, rohit sarewar wrote:


 I am looking for an explanation of Kerberos working with Hadoop cluster .
 I need to know how KDC is used by hdfs and mapred.

 (Something like this :- An example of kerberos with mail server ,
 https://www.youtube.com/watch?v=KD2Q-2ToloE)

 How the name node and data node are prone to attacks ?
 What all types of attack can occur ?

 Please help!





How does Kerberos work with Hadoop ?

2013-02-21 Thread rohit sarewar
I am looking for an explanation of Kerberos working with Hadoop cluster .
I need to know how KDC is used by hdfs and mapred.

(Something like this :- An example of kerberos with mail server ,
https://www.youtube.com/watch?v=KD2Q-2ToloE)

How the name node and data node are prone to attacks ?
What all types of attack can occur ?

Please help!


Re: How does Kerberos work with Hadoop ?

2013-02-21 Thread Vinod Kumar Vavilapalli

You should read the hadoop security design doc which you can find at 
https://issues.apache.org/jira/browse/HADOOP-4487

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Feb 21, 2013, at 11:02 AM, rohit sarewar wrote:

 
 I am looking for an explanation of Kerberos working with Hadoop cluster .
 I need to know how KDC is used by hdfs and mapred.
 
 (Something like this :- An example of kerberos with mail server , 
 https://www.youtube.com/watch?v=KD2Q-2ToloE)
 
 How the name node and data node are prone to attacks ?
 What all types of attack can occur ?
 
 Please help!
 



Re: How does decommissioning work

2012-09-21 Thread Artem Ervits
Is there a way to see the rate at which under replicated blocks are being 
propagated to other data nodes besides running fsck command, meaning more in 
real time?


Artem Ervits
Data Analyst
New York Presbyterian Hospital

From: Suresh Srinivas [mailto:sur...@hortonworks.com]
Sent: Wednesday, September 19, 2012 02:38 PM
To: hdfs-user@hadoop.apache.org hdfs-user@hadoop.apache.org
Subject: Re: How does decommissioning work

Bryan,

I am going to assume that you know about replication factor per file, block, 
replicas etc.

Nodes are marked for decommissioned by adding it to excludes file (btw you mean 
dfs.hosts.exclude right?).
HDFS marks these datanodes as decommissioning. HDFS no longer counts those 
replicas towards the replication factor. This results in increased number of 
under replicated blocks. HDFS starts replicating this under replicated blocks, 
preferring decommissioning node as the source as much as possible.

At this time, the decommissioning nodes are used reads only. Decommissioning 
completes when replication completes and replicas from those nodes are no more 
needed. The node is then marked decommissioned.

Not sure I answered your questions.

Regards,
Suresh

On Tue, Sep 18, 2012 at 4:01 PM, Bryan Beaudreault 
bbeaudrea...@hubspot.commailto:bbeaudrea...@hubspot.com wrote:
Hello,

I'm using cdh3u2, if it matters.  I'm using the dfs.exclude.hosts to 
decommission a good percentage of my cluster as I scale it down for a period of 
time.  I'm just trying to understand how hdfs goes about this, because I 
haven't found anything more than a how to use documentation for the feature.

When I look at the name node UI, I see under replicated blocks count go up when 
I decommission.  Also, when I look at the dfsnodelist with 
whatNodes=decommissioning, there are stats there like Blocks with no live 
replicas, etc.

When I decommission a node does it immediately make that node unavailable, thus 
these stats?  Or does it move them off safely and these counts are just to say 
what would happen if the node was shut down without decommission?  Something 
else?

Thanks for any insight.



--
http://hortonworks.com/download/





This electronic message is intended to be for the use only of the named 
recipient, and may contain information that is confidential or privileged.  If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or use of the contents of this message is 
strictly prohibited.  If you have received this message in error or are not the 
named recipient, please notify us immediately by contacting the sender at the 
electronic mail address noted above, and delete and destroy all copies of this 
message.  Thank you.






This electronic message is intended to be for the use only of the named 
recipient, and may contain information that is confidential or privileged.  If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or use of the contents of this message is 
strictly prohibited.  If you have received this message in error or are not the 
named recipient, please notify us immediately by contacting the sender at the 
electronic mail address noted above, and delete and destroy all copies of this 
message.  Thank you.





How does decommissioning work

2012-09-18 Thread Bryan Beaudreault
Hello,

I'm using cdh3u2, if it matters.  I'm using the dfs.exclude.hosts to
decommission a good percentage of my cluster as I scale it down for a
period of time.  I'm just trying to understand how hdfs goes about this,
because I haven't found anything more than a how to use documentation for
the feature.

When I look at the name node UI, I see under replicated blocks count go up
when I decommission.  Also, when I look at the dfsnodelist with
whatNodes=decommissioning, there are stats there like Blocks with no live
replicas, etc.

When I decommission a node does it immediately make that node unavailable,
thus these stats?  Or does it move them off safely and these counts are
just to say what would happen if the node was shut down without
decommission?  Something else?

Thanks for any insight.


RE: how does hadoop work?

2009-12-21 Thread Ricky Ho
To my understanding, if data resides in HDFS, then the JobTracker will make use 
of location information to allocate data to the TaskTracker and hence can 
reduce the data movement between the data source and the Mapper.  Data movement 
between Mapper and Reducer is harder to minimize (maybe providing an 
application specific partitioner).

Hadoop is targeting at parallel complex processing algorithm, where the data 
movement overhead is relatively insignificant. It is not for everything.

Rgds,
Ricky
-Original Message-
From: Doopah Shaf [mailto:doopha.s...@gmail.com] 
Sent: Sunday, December 20, 2009 11:14 PM
To: common-user@hadoop.apache.org
Subject: how does hadoop work?

Trying to figure out how hadoop actually achieves its speed. Assuming that
data locality is central to the efficiency of hadoop, how does the magic
actually happen, given that data still gets moved all over the network to
reach the reducers?

For example, if I have 1gb of logs spread across 10 data nodes, and for the
sake of argument, assume I use the identity mapper. Then 90% of data still
needs to move across the network - how does the network not become saturated
this way?

What did I miss?...
Thanks,
D.S.


Re: how does hadoop work?

2009-12-21 Thread Patrick Angeles
DS,

What you say is true, but there are finer points:

   1. Data transfer can begin while the mapper is working through the data.
   You would still bottleneck on the network if: (a) you have enough nodes and
   spindles such that the aggregate disk transfer speed is greater than the
   network capacity, and (b) the computation is trivial such that you produce
   data faster than the network can sustain.
   2. Compression can work wonders. You also get a better compression ratio
   if you have a large map output key. (Imagine: {K,V1}, {K,V2}, {K,V3} gets
   transferred as {K, {V1, V2, V3}})
   3. In reality, most algorithms can be designed such that the map output
   is much much smaller than the input data (e.g., count, sum, min, max, etc.).
   4. If you're doing a simple transformation where 1 line of input = 1 line
   of output (e.g., the identity mapper), then you can configure those to be
   map only jobs, thus no shuffle.

Hope this helps,

- P
On Mon, Dec 21, 2009 at 2:14 AM, Doopah Shaf doopha.s...@gmail.com wrote:

 Trying to figure out how hadoop actually achieves its speed. Assuming that
 data locality is central to the efficiency of hadoop, how does the magic
 actually happen, given that data still gets moved all over the network to
 reach the reducers?

 For example, if I have 1gb of logs spread across 10 data nodes, and for the
 sake of argument, assume I use the identity mapper. Then 90% of data still
 needs to move across the network - how does the network not become
 saturated
 this way?

 What did I miss?...
 Thanks,
 D.S.



how does hadoop work?

2009-12-20 Thread Doopah Shaf
Trying to figure out how hadoop actually achieves its speed. Assuming that
data locality is central to the efficiency of hadoop, how does the magic
actually happen, given that data still gets moved all over the network to
reach the reducers?

For example, if I have 1gb of logs spread across 10 data nodes, and for the
sake of argument, assume I use the identity mapper. Then 90% of data still
needs to move across the network - how does the network not become saturated
this way?

What did I miss?...
Thanks,
D.S.


Re: How does org.apache.hadoop.mapred.join work?

2008-07-14 Thread Kevin
Thank you, Chris. This solves my questions.
-Kevin


On Mon, Jul 14, 2008 at 11:17 AM, Chris Douglas [EMAIL PROTECTED] wrote:
 Yielding equal partitions means that each input source will offer n
 partitions and for any given partition 0 = i  n, the records in that
 partition are 1) sorted on the same key 2) unique to that partition, i.e. if
 a key k is in partition i for a given source, k appears in no other
 partitions from that source and if any other source contains k, all
 occurrences appear in partition i from that source. All the framework really
 effects is the cartesian product of all matching keys, so yes, that implies
 equi-joins.

 It's a fairly strict requirement. Satisfying it is less onerous if one is
 joining the output of several m/r jobs, each of which uses the same
 keys/partitioner, the same number of reduces, and each output file
 (part-x) of each job is not splittable. In this case, n is equal to the
 number of output files from each job (the number of reduces), (1) is
 satisfied if the reduce emits records in the same order (i.e. no new keys,
 no records out of order), and (2) is guaranteed by the partitioner and (1).

 An InputFormat capable of parsing metadata about each source to generate
 partitions from the set of input sources is ideal, but I can point to no
 existing implementation. -C

 On Jul 14, 2008, at 9:20 AM, Kevin wrote:

 Hi,

 I find limited information about this package which looks like could
 do equi? join. Given a set of sorted datasets keyed with the same
 class and yielding equal partitions, it is possible to effect a join
 of those datasets prior to the map.  What does yielding equal
 partitions mean?

 Thank you.

 -Kevin