Re: Directed hdfs block reads

2017-07-23 Thread Harsh J
There isn't an API way to hint/select DNs to read from currently - you may
need to do manual changes (contribution of such a feature is welcome,
please file a JIRA to submit a proposal).

You can perhaps hook your control of which replica location for a given
block is selected by the reader under the non-public method
DFSInputStream#getBestNodeDNAddrPair(…):
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L982-L1021
(ensure
to preserve the existing logic around edge cases, however)

Note though that the block replica location list returned for off-rack
reads by the NameNode are randomized by default. Are you observing a
non-random distribution of reads?

On Wed, 19 Jul 2017 at 06:16 Shivram Mani  wrote:

> We have an application which uses the DFSInputStream to read blocks from a
> *remote* hadoop cluster. Is there any way we can influence which specific
> datanode the block fetch request is dispatched to ?
>
> The reasoning behind this is since our application workload is very heavy
> on IO , we would like to distribute the IO load as evenly as possible
> across the hosts/disks. Hence prior to reading data, we wish to obtain the
> location of the underlying blocks and build a dispatch plan so as to
> maximize the IO throughput on the HDFS cluster.
>
> How do we go about this ?
>
>
> --
> Thanks
> Shivram
>


Re: Kerberised JobHistory Server not starting: User jhs trying to create the /mr-history/done directory

2017-07-23 Thread Kevin Buckley
On 21 July 2017 at 13:25, Kevin Buckley
 wrote:
> On 21 July 2017 at 04:04, Erik Krogen  wrote:
>> Hi Kevin,
>>
>> Since you are using the "jhs" keytab with principal "jhs/_h...@realm.tld",
>> the JHS is authenticating itself as the jhs user (which is the actual
>> important part, rather than the user the process is running as). If you want
>> it to be the "mapred" user, you should change the keytab/principal you use
>> (mapred.jobhistory.{principal,keytab}).
>
> I'll certainly give that a go Erik, however, the way I read the
>
>>> The hadoop-2.8.0  docs SecureMode page also suggests that one would need to
>>> play around with the
>>>
>>> hadoop.security.auth_to_local
>
> bits suggested to me that if you set things up such that
>
> ===
> $ hadoop org.apache.hadoop.security.HadoopKerberosName
> jhs/co246a-9.ecs.vuw.ac...@ecs.vuw.ac.nz
> 17/07/20 17:42:50 INFO util.KerberosName: Non-simple name
> mapred/co246a-9.ecs.vuw.ac...@ecs.vuw.ac.nz after auth_to_local rule
> RULE:[2:$1/$2@$0](jhs/.*)s/jhs/mapred/
> Name: jhs/co246a-9.ecs.vuw.ac...@ecs.vuw.ac.nz to
> mapred/co246a-9.ecs.vuw.ac...@ecs.vuw.ac.nz
> 
>
> (or even used a rule that just mapped the principal to a simple "mapred"
> because I tried that too !) told you it was remapping the user, then it would
> remap for all instances of the user, within the Hadoop instance..
>
> Let's see.

OK,

so it would appear that despite the Hadoop docs appearing to suggest that
you only need the three usernames, 'hdfs', 'yarn' and 'mapred'. if you do use
the principal from the docs, which has the jhs component, then even if you
do try to map users using 'hadoop.security.auth_to_local', your JobHistory
server will start up, inside Hadoop running as a 'jhs' user.

That would seem to be a bit of a trap for the unaware/unwary that the docs
could easily improve upon ?

Thanks again for the pointer to the correct interpreation of the docs, Erik,
Kevin

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Invitation for a Hadoop Presenter in HPC China 2016, Hefei City China, 21st October

2017-07-23 Thread Forrest.Wc.Ling
Dear Hadoop:

My name is Forrest Ling, a HPC Solution Architect at Dell China and a member of 
CCF TCHPC (China Compute Foundation, Technical Committee of High Performance 
Computing,  http://www.ccf.org.cn/zw/zwmd/gxnjs/  ).  HPC China 2017 
(http://hpcchina2017.csp.escience.cn/dct/page/1 ) , the biggest HPC annual 
conference in China, will be organized by CCF TCHPC. I am arranging the speech 
slots titled “HPC Open Source Software Stack” Forum in the morning 21st Oct of 
HPC China 2017.

We are to arrange 6 - 8 speech slots for the Forum, now are inviting Linux 
Foundation, OpenHPC, OpenPBSpro, Lustre, etc to do presentations in the Forum.

Because Big Data is merging with HPC/DL, I would like to invite Apache Hadoop 
to deliver a speech in the Forum of “HPC Open Source Softwarer Stack ” in HPC 
China 2017 in Hefei City in the morning of 21st October.

Can you please help to advise a contact of Apache Hadoop or please help to 
check if Apache Hadoop can assign/authorize a presenter to speak on Apache 
Hadoop in HPC Open Source Software Stack Forum in HPC China 2017 in Hefei City, 
China Beijing, 21st October ?

We can cover hotel/meals expense for the presenter in HPC China 2017 during 
19-21 October. We can offer official Invitation Letters for overseas presenters 
for VISA to China.

I am looking forward to hearing from you.


Thanks,
Forrest Ling (凌巍才)
+86 18600622522
Dell HPC Product Technologist Greater China



Re: Hadoop Issues - Can't find on StackOverflow

2017-07-23 Thread Jeff Zhang
I would suggest user to run hadoop on linux, do not waste time on these OS
related issues.



zlgonzalez 于2017年7月23日周日 下午12:06写道:

> Looks like you're running on Windows. Not sure if hadoop running on
> Windows even for single nodes is still supported...
>
> Thanks,
> Ron
>
>
>
> Sent via the Samsung Galaxy S7 active, an AT 4G LTE smartphone
>
>  Original message 
> From: "संजीव (Sanjeev Tripurari)" 
> Date: 7/22/17 8:25 PM (GMT-08:00)
> To: johnsonngu...@ups.com.invalid
> Cc: user@hadoop.apache.org
> Subject: Re: Hadoop Issues - Can't find on StackOverflow
>
> Hi,
>
> Can you share the environment where you did the setup.
> - Which distribution
> - OS
> - Hadoop version
>
> Regards
> -Sanjeev
>
>
> On 19 July 2017 at 00:36,  wrote:
>
>> Hi Hadoop People,
>>
>>
>>
>>  I cannot get my YARN to run for my single node cluster. The error I
>> receive when I run start-yarn.sh is:
>>
>> No Title
>>
>> starting yarn daemons
>>
>> mkdir: cannot create directory ‘/opt/hadoop/bin/yarn.cmd’: Not a directory
>>
>> chown: cannot access ‘/opt/hadoop/bin/yarn.cmd/logs’: Not a directory
>>
>> starting resourcemanager, logging to
>> /opt/hadoop/bin/yarn.cmd/logs/yarn-vkq8pyw-resourcemanager-master.out
>>
>> /root/hadoop-2.7.2/sbin/yarn-daemon.sh: line 123: cd:
>> /opt/hadoop/bin/yarn.cmd: Not a directory
>>
>> /root/hadoop-2.7.2/sbin/yarn-daemon.sh: line 124:
>> /opt/hadoop/bin/yarn.cmd/logs/yarn-vkq8pyw-resourcemanager-master.out: Not
>> a directory
>>
>> head: cannot open
>> ‘/opt/hadoop/bin/yarn.cmd/logs/yarn-vkq8pyw-resourcemanager-master.out’ for
>> reading: Not a directory
>>
>> /root/hadoop-2.7.2/sbin/yarn-daemon.sh: line 129:
>> /opt/hadoop/bin/yarn.cmd/logs/yarn-vkq8pyw-resourcemanager-master.out: Not
>> a directory
>>
>> /root/hadoop-2.7.2/sbin/yarn-daemon.sh: line 130:
>> /opt/hadoop/bin/yarn.cmd/logs/yarn-vkq8pyw-resourcemanager-master.out: Not
>> a directory
>>
>>
>>
>>
>>
>> If there is a fix to this, please do tell. Thank you for your help and
>> reading.
>>
>>
>>
>> Best Regards,
>>
>> Johnson Nguyen
>>
>
>