Re: How does this work
Hue and Beeline access your warehouse data and metadata via the HiveServer2 APIs. The HiveServer2 service runs as the 'hive' user. On Wed, Dec 23, 2015 at 9:42 PM Kumar Jayapalwrote: > Hi, > > My environment has Kerbros and Senry for authentication and authorisation. > > we have the following permission on > > drwxrwx--- - hive hive */user/hive/warehouse* > > Now When I login through Hue/Beeline how am able to acccess the data > inside this directory. > > When I dont belong to hive group. > > > > > Thanks > Jay >
Re: How does Kerberos work with Hadoop ?
Thanks Vinod for the wonderful pdf. It explains how the security can be achieved via Kerberos. Any other way to implement security in hadoop (without using kerberos)? On Fri, Feb 22, 2013 at 2:39 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: You should read the hadoop security design doc which you can find at https://issues.apache.org/jira/browse/HADOOP-4487 HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Feb 21, 2013, at 11:02 AM, rohit sarewar wrote: I am looking for an explanation of Kerberos working with Hadoop cluster . I need to know how KDC is used by hdfs and mapred. (Something like this :- An example of kerberos with mail server , https://www.youtube.com/watch?v=KD2Q-2ToloE) How the name node and data node are prone to attacks ? What all types of attack can occur ? Please help!
How does Kerberos work with Hadoop ?
I am looking for an explanation of Kerberos working with Hadoop cluster . I need to know how KDC is used by hdfs and mapred. (Something like this :- An example of kerberos with mail server , https://www.youtube.com/watch?v=KD2Q-2ToloE) How the name node and data node are prone to attacks ? What all types of attack can occur ? Please help!
Re: How does Kerberos work with Hadoop ?
You should read the hadoop security design doc which you can find at https://issues.apache.org/jira/browse/HADOOP-4487 HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Feb 21, 2013, at 11:02 AM, rohit sarewar wrote: I am looking for an explanation of Kerberos working with Hadoop cluster . I need to know how KDC is used by hdfs and mapred. (Something like this :- An example of kerberos with mail server , https://www.youtube.com/watch?v=KD2Q-2ToloE) How the name node and data node are prone to attacks ? What all types of attack can occur ? Please help!
Re: How does decommissioning work
Is there a way to see the rate at which under replicated blocks are being propagated to other data nodes besides running fsck command, meaning more in real time? Artem Ervits Data Analyst New York Presbyterian Hospital From: Suresh Srinivas [mailto:sur...@hortonworks.com] Sent: Wednesday, September 19, 2012 02:38 PM To: hdfs-user@hadoop.apache.org hdfs-user@hadoop.apache.org Subject: Re: How does decommissioning work Bryan, I am going to assume that you know about replication factor per file, block, replicas etc. Nodes are marked for decommissioned by adding it to excludes file (btw you mean dfs.hosts.exclude right?). HDFS marks these datanodes as decommissioning. HDFS no longer counts those replicas towards the replication factor. This results in increased number of under replicated blocks. HDFS starts replicating this under replicated blocks, preferring decommissioning node as the source as much as possible. At this time, the decommissioning nodes are used reads only. Decommissioning completes when replication completes and replicas from those nodes are no more needed. The node is then marked decommissioned. Not sure I answered your questions. Regards, Suresh On Tue, Sep 18, 2012 at 4:01 PM, Bryan Beaudreault bbeaudrea...@hubspot.commailto:bbeaudrea...@hubspot.com wrote: Hello, I'm using cdh3u2, if it matters. I'm using the dfs.exclude.hosts to decommission a good percentage of my cluster as I scale it down for a period of time. I'm just trying to understand how hdfs goes about this, because I haven't found anything more than a how to use documentation for the feature. When I look at the name node UI, I see under replicated blocks count go up when I decommission. Also, when I look at the dfsnodelist with whatNodes=decommissioning, there are stats there like Blocks with no live replicas, etc. When I decommission a node does it immediately make that node unavailable, thus these stats? Or does it move them off safely and these counts are just to say what would happen if the node was shut down without decommission? Something else? Thanks for any insight. -- http://hortonworks.com/download/ This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.
How does decommissioning work
Hello, I'm using cdh3u2, if it matters. I'm using the dfs.exclude.hosts to decommission a good percentage of my cluster as I scale it down for a period of time. I'm just trying to understand how hdfs goes about this, because I haven't found anything more than a how to use documentation for the feature. When I look at the name node UI, I see under replicated blocks count go up when I decommission. Also, when I look at the dfsnodelist with whatNodes=decommissioning, there are stats there like Blocks with no live replicas, etc. When I decommission a node does it immediately make that node unavailable, thus these stats? Or does it move them off safely and these counts are just to say what would happen if the node was shut down without decommission? Something else? Thanks for any insight.
RE: how does hadoop work?
To my understanding, if data resides in HDFS, then the JobTracker will make use of location information to allocate data to the TaskTracker and hence can reduce the data movement between the data source and the Mapper. Data movement between Mapper and Reducer is harder to minimize (maybe providing an application specific partitioner). Hadoop is targeting at parallel complex processing algorithm, where the data movement overhead is relatively insignificant. It is not for everything. Rgds, Ricky -Original Message- From: Doopah Shaf [mailto:doopha.s...@gmail.com] Sent: Sunday, December 20, 2009 11:14 PM To: common-user@hadoop.apache.org Subject: how does hadoop work? Trying to figure out how hadoop actually achieves its speed. Assuming that data locality is central to the efficiency of hadoop, how does the magic actually happen, given that data still gets moved all over the network to reach the reducers? For example, if I have 1gb of logs spread across 10 data nodes, and for the sake of argument, assume I use the identity mapper. Then 90% of data still needs to move across the network - how does the network not become saturated this way? What did I miss?... Thanks, D.S.
Re: how does hadoop work?
DS, What you say is true, but there are finer points: 1. Data transfer can begin while the mapper is working through the data. You would still bottleneck on the network if: (a) you have enough nodes and spindles such that the aggregate disk transfer speed is greater than the network capacity, and (b) the computation is trivial such that you produce data faster than the network can sustain. 2. Compression can work wonders. You also get a better compression ratio if you have a large map output key. (Imagine: {K,V1}, {K,V2}, {K,V3} gets transferred as {K, {V1, V2, V3}}) 3. In reality, most algorithms can be designed such that the map output is much much smaller than the input data (e.g., count, sum, min, max, etc.). 4. If you're doing a simple transformation where 1 line of input = 1 line of output (e.g., the identity mapper), then you can configure those to be map only jobs, thus no shuffle. Hope this helps, - P On Mon, Dec 21, 2009 at 2:14 AM, Doopah Shaf doopha.s...@gmail.com wrote: Trying to figure out how hadoop actually achieves its speed. Assuming that data locality is central to the efficiency of hadoop, how does the magic actually happen, given that data still gets moved all over the network to reach the reducers? For example, if I have 1gb of logs spread across 10 data nodes, and for the sake of argument, assume I use the identity mapper. Then 90% of data still needs to move across the network - how does the network not become saturated this way? What did I miss?... Thanks, D.S.
how does hadoop work?
Trying to figure out how hadoop actually achieves its speed. Assuming that data locality is central to the efficiency of hadoop, how does the magic actually happen, given that data still gets moved all over the network to reach the reducers? For example, if I have 1gb of logs spread across 10 data nodes, and for the sake of argument, assume I use the identity mapper. Then 90% of data still needs to move across the network - how does the network not become saturated this way? What did I miss?... Thanks, D.S.
Re: How does org.apache.hadoop.mapred.join work?
Thank you, Chris. This solves my questions. -Kevin On Mon, Jul 14, 2008 at 11:17 AM, Chris Douglas [EMAIL PROTECTED] wrote: Yielding equal partitions means that each input source will offer n partitions and for any given partition 0 = i n, the records in that partition are 1) sorted on the same key 2) unique to that partition, i.e. if a key k is in partition i for a given source, k appears in no other partitions from that source and if any other source contains k, all occurrences appear in partition i from that source. All the framework really effects is the cartesian product of all matching keys, so yes, that implies equi-joins. It's a fairly strict requirement. Satisfying it is less onerous if one is joining the output of several m/r jobs, each of which uses the same keys/partitioner, the same number of reduces, and each output file (part-x) of each job is not splittable. In this case, n is equal to the number of output files from each job (the number of reduces), (1) is satisfied if the reduce emits records in the same order (i.e. no new keys, no records out of order), and (2) is guaranteed by the partitioner and (1). An InputFormat capable of parsing metadata about each source to generate partitions from the set of input sources is ideal, but I can point to no existing implementation. -C On Jul 14, 2008, at 9:20 AM, Kevin wrote: Hi, I find limited information about this package which looks like could do equi? join. Given a set of sorted datasets keyed with the same class and yielding equal partitions, it is possible to effect a join of those datasets prior to the map. What does yielding equal partitions mean? Thank you. -Kevin