Re: Compression codec com.hadoop.compression.lzo.LzoCodec not found

2014-02-12 Thread Zhijie Shen
For the codecs, you can choose
among org.apache.hadoop.io.compress.*Codec. LzoCodec has been moved out of
Hadoop (see HADOOP-4874).

- Zhijie


On Wed, Feb 12, 2014 at 10:54 AM, Ted Yu  wrote:

> What's the value for "io.compression.codecs" config parameter ?
>
> Thanks
>
>
> On Tue, Feb 11, 2014 at 10:11 PM, Li Li  wrote:
>
>> I am runing example of wordcout but encount an exception:
>> I googled and know lzo compression's license is incompatible with apache's
>> so it's not built in.
>> the question is I am using default configuration of hadoop 1.2.1, why
>> it need lzo?
>> anothe question is, what's Cleaning up the staging area mean?
>>
>>
>> ./bin/hadoop jar hadoop-examples-1.2.1.jar wordcount /lili/data.txt
>> /lili/test
>>
>> 14/02/12 14:06:10 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/02/12 14:06:10 INFO mapred.JobClient: Cleaning up the staging area
>> hdfs://
>> 172.19.34.24:8020/home/hadoop/dfsdir/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201401080916_0216
>> java.lang.IllegalArgumentException: Compression codec
>> com.hadoop.compression.lzo.LzoCodec not found.
>> at
>> org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:116)
>> at
>> org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:156)
>> at
>> org.apache.hadoop.mapreduce.lib.input.TextInputFormat.isSplitable(TextInputFormat.java:47)
>> at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:258)
>> at
>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
>> at
>> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
>> at
>> org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>> at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
>> at org.apache.hadoop.examples.WordCount.main(WordCount.java:82)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:601)
>> at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> at
>> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:601)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>> Caused by: java.lang.ClassNotFoundException:
>> com.hadoop.compression.lzo.LzoCodec
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:264)
>> at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>> at
>> org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:109)
>>
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This me

Re: job submission between 2 YARN clusters

2014-02-13 Thread Zhijie Shen
Hi Anfernee,

It sounds most likely that config somehow corrupts. So you have two sets of
config to start two YARN cluster separately, don't you? If you provide more
detail about how you config the two clusters, it's easy for the community
to understand your problem.

- Zhijie


On Thu, Feb 13, 2014 at 11:34 AM, Anfernee Xu  wrote:

> I'm at Yarn 2.2.0 release, I configured 2 single-node clusters on my
> laptop(just for POC and all port conflicts are resolved, and I can see NM
> and RM is up, webUI shows everything is fine) and I also have a standalone
> java application. The java application is a kind of job client, it will
> submit job1 to Cluser #1, once the job is finished, it will submit another
> job2 to Cluster #2.
>
> What I'm seeing is the job1 is doing fine, but job2 failed, I looked
> source code, and found the NM in cluser2 was talking to cluser1's RM via
> wrong yarn.resourcemanager.scheduler.address. How that happens? I just want
> to make sure there's no such issue in real deployment.
>
> --
> --Anfernee
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: job submission between 2 YARN clusters

2014-02-14 Thread Zhijie Shen
I thought you need set the following configs differently from two cluster:
"yarn.resourcemanager.resource-tracker.address": NM talks to this address
"yarn.resourcemanager.scheduler.address": Your application talks to this
address
"yarn.resourcemanager.address": Your client talks to this address

Of course, NM needs to be started at different "yarn.nodemanager.address"
in your two clusters.

- Zhijie

On Thu, Feb 13, 2014 at 4:59 PM, Anfernee Xu  wrote:

> Hi Zhijie,
>
> I agree, what I'm doing in the standalone app is that the app loads the
> first cluster Configuration(mapred-site.xml, yarn-site.xml) as its default
> configuration, and then submit MR job with this configuration to the first
> cluster, and after the job is finished, I will submit the second job to the
> second cluster with almost same Configuration exception I changed the
> property: yarn.resourcemanager.address pointing to the second cluster's RM.
> My guess the job.xml of the second job holds all property values of the
> first cluster(such as yarn.resourcemanager.scheduler.address) and will
> override these properties specified in the second cluster(yarn-site.xml for
> example), therefore it will talk to the wrong RM when NM is launching the
> container.
>
> Please comment.
>
> BTW, I just tweak the standalone app so that it will load the second
> cluster's configuration(yarn-site.xml) before submit the second job, it
> seems working.
>
> Thanks
>
>
> On Thu, Feb 13, 2014 at 4:28 PM, Zhijie Shen wrote:
>
>> Hi Anfernee,
>>
>> It sounds most likely that config somehow corrupts. So you have two sets
>> of config to start two YARN cluster separately, don't you? If you provide
>> more detail about how you config the two clusters, it's easy for the
>> community to understand your problem.
>>
>> - Zhijie
>>
>>
>> On Thu, Feb 13, 2014 at 11:34 AM, Anfernee Xu wrote:
>>
>>> I'm at Yarn 2.2.0 release, I configured 2 single-node clusters on my
>>> laptop(just for POC and all port conflicts are resolved, and I can see NM
>>> and RM is up, webUI shows everything is fine) and I also have a standalone
>>> java application. The java application is a kind of job client, it will
>>> submit job1 to Cluser #1, once the job is finished, it will submit another
>>> job2 to Cluster #2.
>>>
>>> What I'm seeing is the job1 is doing fine, but job2 failed, I looked
>>> source code, and found the NM in cluser2 was talking to cluser1's RM via
>>> wrong yarn.resourcemanager.scheduler.address. How that happens? I just want
>>> to make sure there's no such issue in real deployment.
>>>
>>> --
>>> --Anfernee
>>>
>>
>>
>>
>> --
>> Zhijie Shen
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>
>
> --
> --Anfernee
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Aggregation service start

2014-02-16 Thread Zhijie Shen
Please  set


yarn.log-aggregation-enable
true


in yarn-site.xml to enable log aggregation.

-Zhijie
 On Feb 16, 2014 6:15 PM, "EdwardKing"  wrote:

> hadoop 2.2.0, I want to view Tracking UI,so I visit
> http://172.11.12.6:8088/cluster,
> then I click History of Completed Job,such as follows:
>
> MapReduce Job job_1392601388579_0001
> Attempt Number  Start Time NodeLogs
> 1   Sun Feb 16 17:44:57 PST 2014  master:8042  logs
>
> Then I click logs,but it failed.
> Aggregation is not enabled. Try the nodemanager at master:8994
>
> I guess it must a service don't start, which command I need to execute
> under home/software/hadoop-2.2.0/sbin ?  Thanks.
> [hadoop@node1 sbin]$ ls
> distribute-exclude.shstart-all.cmdstop-all.sh
> hadoop-daemon.sh start-all.sh stop-balancer.sh
> hadoop-daemons.shstart-balancer.shstop-dfs.cmd
> hdfs-config.cmd  start-dfs.cmdstop-dfs.sh
> hdfs-config.sh   start-dfs.sh stop-secure-dns.sh
> httpfs.shstart-secure-dns.sh  stop-yarn.cmd
> mr-jobhistory-daemon.sh  start-yarn.cmd   stop-yarn.sh
> refresh-namenodes.sh start-yarn.shyarn-daemon.sh
> slaves.shstop-all.cmd yarn-daemons.sh
>
>
>
>
>
>
>
>
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,
>  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
>
> ---
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Aggregation service start

2014-02-16 Thread Zhijie Shen
"But when the job complete, then I click History of Tracking UI
http://172.11.12.6:8088/cluster again, it raise following error:

Firefox can't establish a connection to the server at master:19888."

This is a problem other than log aggregation. After an MapReduce job
completes, the tracking URL is pointing to the MapReduce history server. It
is very likely that the history server hasn't been running on your machine,
such that you didn't get response from it.

- Zhijie


On Sun, Feb 16, 2014 at 10:03 PM, EdwardKing  wrote:

> Thanks for you help. I set yarn-site.xml as you told me,like follows:
>
> [hadoop@master hadoop]$ cat yarn-site.xml
> 
> 
> 
>
> 
> 
>   yarn.resourcemanager.resource-tracker.address
>   master:8990
>   host is the hostname of the resource manager and port is
> the port on which the NodeManagers contact the Resource Manager.
>   
> 
> 
>   yarn.resourcemanager.scheduler.address
>   master:8991
>   host is the hostname of the resourcemanager and port is the
> port on which the Applications in the cluster talk to the Resource Manager.
>   
> 
> 
>   yarn.resourcemanager.scheduler.class
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>   In case you do not want to use the default
> scheduler
> 
> 
>   yarn.resourcemanager.address
>   master:8993
>   the host is the hostname of the ResourceManager and port is
> the port on which the clients can talk to the Resource Manager
> 
> 
> 
>   yarn.nodemanager.local-dirs
>   /home/software/tmp/node
>   the local directions used by the nodemanager
> 
> 
>   yarn.nodemanager.address
>   master:8994
>   the nodemanagers bind to this port
> 
> 
>   yarn.nodemanager.resource.memory-mb
>   5120
>   the amount of memory on the NodeManager in GB
> 
> 
>   yarn.nodemanager.remote-app-log-dir
>   /home/software/tmp/app-logs
>   directory on hdfs where the application logs are moved to
> 
> 
> 
>   yarn.nodemanager.log-dirs
>   /home/software/tmp/node
>   the directories used by Nodemanager as log
> directories
> 
> 
>   yarn.nodemanager.aux-services
>   mapreduce_shuffle
>   shuffle service that needs to be set for Map Reduce to run
> 
> 
>
> 
>   yarn.log-aggregation-enable
>   true
> 
>
> 
>
> Then I submit a job,when this job is running, I click History of Tracking
> UI http://172.11.12.6:8088/cluster
> I can view all log information. It runs ok.
> But when the job complete, then I click History of Tracking UI
> http://172.11.12.6:8088/cluster again, it raise following error:
>
> Firefox can't establish a connection to the server at master:19888.
>
> Do I missing some configuration information in my xml file?  How to
> correct?  Thanks in advance.
>
>
>
>
>
>
>
>
> - Original Message -
> From: Zhijie Shen
> To: user@hadoop.apache.org
> Sent: Monday, February 17, 2014 11:11 AM
> Subject: Re: Aggregation service start
>
>
> Please  set
> 
> yarn.log-aggregation-enable
> true
> 
> in yarn-site.xml to enable log aggregation.
> -Zhijie
>
> On Feb 16, 2014 6:15 PM, "EdwardKing"  wrote:
>
> hadoop 2.2.0, I want to view Tracking UI,so I visit
> http://172.11.12.6:8088/cluster,
> then I click History of Completed Job,such as follows:
>
> MapReduce Job job_1392601388579_0001
> Attempt Number  Start Time NodeLogs
> 1   Sun Feb 16 17:44:57 PST 2014  master:8042  logs
>
> Then I click logs,but it failed.
> Aggregation is not enabled. Try the nodemanager at master:8994
>
> I guess it must a service don't start, which command I need to execute
> under home/software/hadoop-2.2.0/sbin ?  Thanks.
> [hadoop@node1 sbin]$ ls
> distribute-exclude.shstart-all.cmdstop-all.sh
> hadoop-daemon.sh start-all.sh stop-balancer.sh
> hadoop-daemons.shstart-balancer.shstop-dfs.cmd
> hdfs-config.cmd  start-dfs.cmdstop-dfs.sh
> hdfs-config.sh   start-dfs.sh stop-secure-dns.sh
> httpfs.shstart-secure-dns.sh  stop-yarn.cmd
> mr-jobhistory-daemon.sh  start-yarn.cmd   stop-yarn.sh
> refresh-namenodes.sh start-yarn.shyarn-daemon.sh
> slaves.shstop-all.cmd yarn-daemons.sh
>
>
>
>
>
>
>
>
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient a

Re: No job shown in Hadoop resource manager web UI when running jobs in the cluster

2014-02-21 Thread Zhijie Shen
Hi Richard,

Not sure how NPE happened on you command line, but I'd like to clarify
something here:
1. If you want to see mapreduce jobs, please use "mapred job". "hadoop job"
is deprecated. If you want to see all kinds of applications run by your
YARN cluster, please use "yarn application".

2. Job history server only shows the finished mapreduce jobs. There will be
another application history server that shows all completed applications
run by YARN, but it's not available on 2.2.

3. ResourceManager webUI is not the job history web UI. You should check
your yarn-site.xml to see what's the address of the RM webUI. It will list
all the applications that RM remembers.

- Zhijie


On Thu, Feb 20, 2014 at 7:04 PM, Chen, Richard wrote:

>  Dear group,
>
>
>
> I compiled hadoop 2.2.0 x64 and running it on a cluster. When I do hadoop
> job -list or hadoop job -list all, it throws a NPE like this:
>
> 14/01/28 17:18:39 INFO Configuration.deprecation: session.id is
> deprecated. Instead, use dfs.metrics.session-id
>
> 14/01/28 17:18:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
>
> Exception in thread "main" java.lang.NullPointerException
>
> at org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:504)
>
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:312)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1237)
>
> and on hadoop webapp like jobhistory ( I turn on the jobhistory server).
> It shows no job was running and no job finishing although I was running
> jobs.
>
> Please help me to solve this problem.
>
> Thanks!!
>
>
>
> Richard Chen
>
>
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop on windows

2014-02-22 Thread Zhijie Shen
Point you to some links that may be helpful:

https://wiki.apache.org/hadoop/Hadoop2OnWindows
http://hortonworks.com/labs/microsoft/
http://hortonworks.com/blog/install-hadoop-windows-hortonworks-data-platform-2-0/

- Zhijie


On Sat, Feb 22, 2014 at 12:10 PM, oscar sumano  wrote:

> On Feb 22, 2014 2:51 PM, "oscar sumano"  wrote:
>
>> Hi,
>>
>> Is it recommended to run hadoop on windows or linux? We are a big windows
>> shop.
>>
>> I believe hortonworks is the only one with a windows distribution.
>>
>> Any guidance would be great.
>>
>> Our .net developers are pushing for hadoop on windows because they can
>> run .net hadoop streaming map reduce jobs.
>>
>> Any way to run .net map reduce jobs on linux?  That's another option we
>> are looking at.
>>
>> Thanks
>>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: YARN - Running Client with third party jars

2014-02-25 Thread Zhijie Shen
Have you include you json jar in HADOOP_CLASSPATH?

- Zhijie


On Tue, Feb 25, 2014 at 9:22 PM, Anand Mundada wrote:

> Hi I want to use json jar in client code.
> I tried to create runnable jar which include all required jars.
> But I am getting following exception.
>
> java.lang.NoClassDefFoundError: org/json/simple/parser/JSONParser
> at distributeddb.Client.readJSON(Client.java:250)
> at distributeddb.Client.getPartitionInfo(Client.java:284)
> at distributeddb.Client.main(Client.java:120)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.lang.ClassNotFoundException:
> org.json.simple.parser.JSONParser
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 8 more
>
> How to solve this issue ?
>
> Thanks,
> Anand
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: query

2014-02-25 Thread Zhijie Shen
Here are some more information related to windows platform:

https://wiki.apache.org/hadoop/Hadoop2OnWindows

- Zhijie


On Tue, Feb 25, 2014 at 9:56 PM, shashwat shriparv <
dwivedishash...@gmail.com> wrote:

> Try these links:
>
> http://wiki.apache.org/hadoop/EclipseEnvironment
>
> http://blog.cloudera.com/blog/2013/05/how-to-configure-eclipse-for-hadoop-contributions/
>
> http://blog.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
> http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html
>
>
>
> * Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: 
> http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] 
> <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 
>
>
>
> On Wed, Feb 26, 2014 at 11:19 AM, Banty Sharma 
> wrote:
>
>> Hii,
>>
>> step by step i want to build and develop hadoop in eclipse in
>> windows...can anybody help to find the source code of hadoop and document
>> how i can import that in eclipse in windows..
>>
>> Thanx n Regards
>>
>> Jhanver sharma
>>
>>
>> On Mon, Feb 24, 2014 at 4:03 PM, Banty Sharma 
>> wrote:
>>
>>> hello !! i want to get a information about hadoop development..from
>>> where i can get actual procedure to solve the issues..
>>>
>>
>>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Unmanaged application issue on YARN

2014-02-25 Thread Zhijie Shen
Hi Jeff,

To get the AMRMToken, you can use YarnClient#getAMRMToken. For unmanaged
application, you can have look at UnmanagedAMLauncher, you didn't check it
before. It may simplify your problem

- Zhijie


On Tue, Feb 25, 2014 at 9:38 PM, Jeff Zhang  wrote:

> Hi all,
>
> I build an ummanaged application and submit it to yarn ( hadoop 2.2). But
> encounter the following exception:
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
> SIMPLE authentication is not enabled.  Available:[TOKEN]
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
> ProtobufRpcEngine.java:206)
>
> at $Proxy9.registerApplicationMaster(Unknown Source)
>
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(
> ApplicationMasterProtocolPBClientImpl.java:106)
> I find there's one jira ticket very similar to this issue, but looks like
> it has been resolved in 2.2
> https://issues.apache.org/jira/browse/YARN-945
>
>
> I find that ResourceTrackerService will use simple authentication but
> ApplicationMasterService will use token. The reason I think is at the
> following code snippet in ApplicationMasterSerivce ( line 127 )
>serverConf.set(
>
> CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
>
> SaslRpcServer.AuthMethod.TOKEN.toString());
>
>
> I'm not sure why here use the token, could any help explain that and guide
> me how to resolve my issue? Thanks
>
>
> Jeff Zhang
>
>
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Newbie, any tutorial for install hadoop 2.3 with proper linux version

2014-02-27 Thread Zhijie Shen
This is the link about cluster setup:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

- Zhijie


On Thu, Feb 27, 2014 at 9:41 PM, Alex Lee  wrote:

> Hello,
>
> I am quite a newbie here. And want to setup hadoop 2.3 on 4 new PCs. Later
> may add more PCs into it. Is there any tutorial I can learn from, such as
> the which linux version I should use, how to setup the linux, and how to
> install the hadoop step by step.
>
> I am trying to setup cluster and aim to store TB data. Any suggestion。
>
> With Best Regards,
>
> Alex
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Meaning of messages in log and debugging

2014-03-04 Thread Zhijie Shen
bq. Container killed by the ApplicationMaster. Container killed on request.
Exit code is 143" mean? What does 143 stand for?

It's the diagnostic message generated by YARN, which indicates the
container is killed by MR's ApplicationMaster. 143 is a exit code of an
YARN container, which indicates the termination of a container.

bq. Are there any related links which describe the life cycle of a
container?

This is what I found online:
http://diggerk.wordpress.com/2013/09/19/lifecycle-of-yarn-resource-manager-containers/.
Otherwise, you can have a look at ContainerImpl.java if you want to know
the detail.

bq. My application is very memory intense... is there any way to profile the
memory consumption of a single container?

You can find the metrics info RM and NM web UI, or you
can programmatically access the RESTful APIs.

- Zhijie


On Tue, Mar 4, 2014 at 7:24 AM, Yves Weissig  wrote:

> Hello list,
>
> I'm currently debugging my Hadoop MR application and I have some general
> questions to the messages in the log and the debugging process.
>
> - What does "Container killed by the ApplicationMaster.
> Container killed on request. Exit code is 143" mean? What does 143 stand
> for?
>
> - I also see the following exception in the log: "Exception from
> container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)". What does this mean? It
> originates from a "Diagnostics report" from a container and the log4j
> message level is set to INFO.
>
> - Are there any related links which describe the life cycle of a container?
>
> - Is there a "golden rule" to debug a Hadoop MR application?
>
> - My application is very memory intense... is there any way to profile
> the memory consumption of a single container?
>
> Thanks!
> Best regards
> Yves
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: how to import the hadoop code into eclipse.

2014-03-06 Thread Zhijie Shen
mvn eclipse:eclipse, and then import the existing projects in eclipse.

- Zhijie


On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur  wrote:

> hi,
>
> i have downloaded the hadoop code. And executed maven command
> successfully. how to import hadoop source code cleanly. because its showing
> red exclamation mark on some of the modules while i am importing it.
> help me out.
>  thanks in advance.
>
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: how to import the hadoop code into eclipse.

2014-03-06 Thread Zhijie Shen
ah, yes, I was experiencing some errors on the imported modules, but I
fixed it myself manually. Not sure other people has encounter the same
problem. Here's a link: http://wiki.apache.org/hadoop/EclipseEnvironment


On Thu, Mar 6, 2014 at 9:30 PM, Avinash Kujur  wrote:

> i did that. but i have some doubt while importing code. because its
> showing some warning and error on imported modules. i was wondering if u
> could give me any proper procedure link.
>
>
> On Thu, Mar 6, 2014 at 9:21 PM, Zhijie Shen  wrote:
>
>> mvn eclipse:eclipse, and then import the existing projects in eclipse.
>>
>> - Zhijie
>>
>>
>> On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur  wrote:
>>
>>> hi,
>>>
>>> i have downloaded the hadoop code. And executed maven command
>>> successfully. how to import hadoop source code cleanly. because its showing
>>> red exclamation mark on some of the modules while i am importing it.
>>> help me out.
>>>  thanks in advance.
>>>
>>>
>>
>>
>>
>> --
>> Zhijie Shen
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Can a YARN Cient or Application Master determine when log aggregation has completed?

2014-03-10 Thread Zhijie Shen
Hi Geoff,

Unfortunately, there's no such a API for users to determine whether the log
aggregation is completed or not, but the issue has been tackled. You can
keep an eye on YARN-1279.

- Zhijie


On Mon, Mar 10, 2014 at 10:18 AM, Geoff Thompson  wrote:

> Hello,
>
> Log aggregation is great. However, if a yarn application runs a large
> number of tasks which generate large logs, it takes some finite amount of
> time for all of the logs to be collected and written to the HDFS.
>
> Currently our client code runs the equivalent of the "yarn logs" command
> once all tasks have completed. This works fine provided log aggregation is
> complete.
>
> But it fails in a variety of ways if aggregation is not complete. This
> includes one case where the "yarn logs" code encounters no exceptions and
> no non-zero return codes from methods, but returns an empty string.
>
> So, is there a way to determine if log aggregation is complete?
>
> Thanks,
>
> Geoff
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: task is still running on node has no disk space

2014-03-30 Thread Zhijie Shen
Hi Anfernee,

In 2.2, LocalDirsHandlerService doesn't check whether the disk is full or
not. It seem that disk fullness check will be available in 2.4: YARN-1781

- Zhijie


On Sun, Mar 30, 2014 at 10:33 AM, Anfernee Xu  wrote:

>  Hi,
>
> I'm running 2.2.0 clusters, my application is pretty disk I/O
> expensive(processing huge zip files), overtime I found some job failure due
> to "no space on disk", normally the leftover files can be cleaned, but for
> some reason if they're not, I expect no more new task can run on this node,
> but in fact I still can see new tasks are coming to that node and keep
> failing. My application will write data to /tmp(where may cause out of disk
> space), so I can configure below properties:
>
> 
>  yarn.nodemanager.local-dirs
>  
>  /scratch/usr/software/hadoop2/hadoop-dc/temp/nm-local-dir,
> /tmp/nm-local-dir
>  
>
>
>   
>  yarn.nodemanager.disk-health-checker.min-healthy-disks
>  1.0
>
>
> As I have /tmp/nm-local-dir as part of $yarn.nodemanager.local-dirs, based
> on doc
>
> yarn.nodemanager.disk-health-checker.min-healthy-disks:
>
> The minimum fraction of number of disks to be healthy for the nodemanager
> to launch new containers. This correspond to both
> yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
> are less number of healthy local-dirs (or log-dirs) available, then new
> containers will not be launched on this node.
>
> Did I miss anything?
>
> --
> --Anfernee
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: hadoop version

2014-03-31 Thread Zhijie Shen
Run "hadoop version"


On Mon, Mar 31, 2014 at 2:22 AM, Avinash Kujur  wrote:

> hi,
>
> how can i know my hadoop version which i have build in my system (apart
> from the version which was in-built in cloudera.)
>
> regards,
> Avinash
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: hadoop version

2014-03-31 Thread Zhijie Shen
I think you may want to use the static methods from VersionInfo.


On Mon, Mar 31, 2014 at 9:20 AM, Steve Lewis  wrote:

> How about programmatically with my my code?
>
>
> On Mon, Mar 31, 2014 at 9:09 AM, Zhijie Shen wrote:
>
>> Run "hadoop version"
>>
>>
>> On Mon, Mar 31, 2014 at 2:22 AM, Avinash Kujur  wrote:
>>
>>> hi,
>>>
>>> how can i know my hadoop version which i have build in my system (apart
>>> from the version which was in-built in cloudera.)
>>>
>>> regards,
>>> Avinash
>>>
>>>
>>
>>
>> --
>> Zhijie Shen
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>
>
> --
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Container states trantition questions

2014-04-02 Thread Zhijie Shen
It should be normal. If you can check the diagnostics of the container, it
is likely that you will see "Container killed by the ApplicationMaster." MR
AM will stop the container when a task is finished.

Thanks,
Zhijie


On Wed, Apr 2, 2014 at 7:22 PM, Fengyun RAO  wrote:

> same for me. all mapper ends with 143.
>
> I've no idea what it means
>
>
> 2014-04-03 8:45 GMT+08:00 Azuryy Yu :
>
> Hi,
>>
>> Does it normal for each container end with TERMINATED(143) ?
>> The whole MR job is successful, but all containers in the map phase end
>> with 143.
>>
>> There are no any useful logs in the NM, AM, Container logs.
>>
>> Another minor question:
>> There are only WARN logs in the stderr:
>> log4j:WARN No appenders could be found for logger
>> (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
>> log4j:WARN Please initialize the log4j system properly.
>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
>> more info.
>> It seems cannot find log4j.propertites, but I've configured:
>> 
>>   mapreduce.application.classpath
>>
>>   
>> $HADOOP_MAPRED_HOME/conf,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*
>> 
>> 
>>yarn.application.classpath
>>
>>   
>> $HADOOP_COMMON_HOME/conf,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*
>> 
>>
>> Appreciate for any inputs.
>>
>>
>>
>>
>>
>>
>>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: download hadoop-2.4

2014-04-10 Thread Zhijie Shen
The official release can be found on:
http://www.apache.org/dyn/closer.cgi/hadoop/common/

But you can also choose to checkout the code from svn/git repository.


On Thu, Apr 10, 2014 at 8:08 PM, Mingjiang Shi  wrote:

> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/
>
>
> On Fri, Apr 11, 2014 at 10:23 AM, lei liu  wrote:
>
>> Hadoop-2.4 is release, where can I download the hadoop-2.4 code from?
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>
>
> --
> Cheers
> -MJ
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Zhijie Shen
1. If you have the binaries that were compiled against MRv1 *mapred* libs,
it should just work with MRv2.
2. If you have the source code that refers to MRv1 *mapred* libs, it should
be compilable without code changes. Of course, you're free to change your
code.
3. If you have the binaries that were compiled against MRv1 *mapreduce* libs,
it may not be executable directly with MRv2, but you should able to compile
it against MRv2 *mapreduce* libs without code changes, and execute it.

- Zhijie


On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe
wrote:

> Thanks John for your comments,
>
> I believe MRv2 has support for both the old *mapred* APIs and new
> *mapreduce* APIs.
>
> I see this way:
> [1.]  One may have binaries i.e. jar file of the M\R program that used old
> *mapred* APIs
> This will work directly on MRv2(YARN).
>
> [2.]  One may have the source code i.e. Java Programs of the M\R program
> that used old *mapred* APIs
> For this I need to recompile and generate the binaries i.e. jar file.
> Do I have to change the old *org.apache.hadoop.mapred* APIs to new *
> org.apache.hadoop.mapreduce* APIs? or No code changes are needed?
>
> -RR
>
> > Date: Mon, 14 Apr 2014 10:37:56 -0400
> > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
> with old *mapred* APIs and new *mapreduce* APIs in Hadoop
> > From: john.meag...@gmail.com
> > To: user@hadoop.apache.org
>
> >
> > Also, "Source Compatibility" also means ONLY a recompile is needed.
> > No code changes should be needed.
> >
> > On Mon, Apr 14, 2014 at 10:37 AM, John Meagher 
> wrote:
> > > Source Compatibility = you need to recompile and use the new version
> > > as part of the compilation
> > >
> > > Binary Compatibility = you can take something compiled against the old
> > > version and run it on the new version
> > >
> > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
> > >  wrote:
> > >> Hello People,
> > >>
> > >> As per the Apache site
> > >>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
> > >>
> > >> Binary Compatibility
> > >> 
> > >> First, we ensure binary compatibility to the applications that use old
> > >> mapred APIs. This means that applications which were built against
> MRv1
> > >> mapred APIs can run directly on YARN without recompilation, merely by
> > >> pointing them to an Apache Hadoop 2.x cluster via configuration.
> > >>
> > >> Source Compatibility
> > >> 
> > >> We cannot ensure complete binary compatibility with the applications
> that
> > >> use mapreduce APIs, as these APIs have evolved a lot since MRv1.
> However, we
> > >> ensure source compatibility for mapreduce APIs that break binary
> > >> compatibility. In other words, users should recompile their
> applications
> > >> that use mapreduce APIs against MRv2 jars. One notable binary
> > >> incompatibility break is Counter and CounterGroup.
> > >>
> > >> For "Binary Compatibility" I understand that if I had build a MR job
> with
> > >> old *mapred* APIs then they can be run directly on YARN without and
> changes.
> > >>
> > >> Can anybody explain what do we mean by "Source Compatibility" here
> and also
> > >> a use case where one will need it?
> > >>
> > >> Does that mean code changes if I already have a MR job source code
> written
> > >> with with old *mapred* APIs and I need to make some changes to it to
> run in
> > >> then I need to use the new "mapreduce* API and generate the new
> binaries?
> > >>
> > >> Thanks,
> > >> -RR
> > >>
> > >>
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Zhijie Shen
bq. Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build
against old MRv1 mapred APIS)

Which APIs are you talking about, *mapred* or *mapreduce*? In #3, I was
saying about *mapreduce*. If this is the case, you may be in the trouble
unfortunately, because MRv2 has evolved so much in *mapreduce *APIs that
it's difficult to ensure binary compatibility. Anyway, you should still try
your luck, as your binaries may not use the incompatible APIs. On the other
hand, if you meant *mapred* APIs instead, you binaries should just work.

- Zhijie


On Tue, Apr 15, 2014 at 1:35 PM, Radhe Radhe
wrote:

> Thanks Zhijie for the explanation.
>
> Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build
> against old MRv1 *mapred* APIS) then how can I compile it since I don't
> have the source code i.e. Java files. All I can do with binaries i.e. jar
> file is execute it.
>
> -RR
> --
> Date: Tue, 15 Apr 2014 13:03:53 -0700
>
> Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
> with old *mapred* APIs and new *mapreduce* APIs in Hadoop
> From: zs...@hortonworks.com
> To: user@hadoop.apache.org
>
>
> 1. If you have the binaries that were compiled against MRv1 *mapred*libs, it 
> should just work with MRv2.
> 2. If you have the source code that refers to MRv1 *mapred* libs, it
> should be compilable without code changes. Of course, you're free to change
> your code.
> 3. If you have the binaries that were compiled against MRv1 *mapreduce* libs,
> it may not be executable directly with MRv2, but you should able to compile
> it against MRv2 *mapreduce* libs without code changes, and execute it.
>
> - Zhijie
>
>
> On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe <
> radhe.krishna.ra...@live.com> wrote:
>
> Thanks John for your comments,
>
> I believe MRv2 has support for both the old *mapred* APIs and new
> *mapreduce* APIs.
>
> I see this way:
> [1.]  One may have binaries i.e. jar file of the M\R program that used old
> *mapred* APIs
> This will work directly on MRv2(YARN).
>
> [2.]  One may have the source code i.e. Java Programs of the M\R program
> that used old *mapred* APIs
> For this I need to recompile and generate the binaries i.e. jar file.
> Do I have to change the old *org.apache.hadoop.mapred* APIs to new *
> org.apache.hadoop.mapreduce* APIs? or No code changes are needed?
>
> -RR
>
> > Date: Mon, 14 Apr 2014 10:37:56 -0400
> > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
> with old *mapred* APIs and new *mapreduce* APIs in Hadoop
> > From: john.meag...@gmail.com
> > To: user@hadoop.apache.org
>
> >
> > Also, "Source Compatibility" also means ONLY a recompile is needed.
> > No code changes should be needed.
> >
> > On Mon, Apr 14, 2014 at 10:37 AM, John Meagher 
> wrote:
> > > Source Compatibility = you need to recompile and use the new version
> > > as part of the compilation
> > >
> > > Binary Compatibility = you can take something compiled against the old
> > > version and run it on the new version
> > >
> > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
> > >  wrote:
> > >> Hello People,
> > >>
> > >> As per the Apache site
> > >>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
> > >>
> > >> Binary Compatibility
> > >> 
> > >> First, we ensure binary compatibility to the applications that use old
> > >> mapred APIs. This means that applications which were built against
> MRv1
> > >> mapred APIs can run directly on YARN without recompilation, merely by
> > >> pointing them to an Apache Hadoop 2.x cluster via configuration.
> > >>
> > >> Source Compatibility
> > >> 
> > >> We cannot ensure complete binary compatibility with the applications
> that
> > >> use mapreduce APIs, as these APIs have evolved a lot since MRv1.
> However, we
> > >> ensure source compatibility for mapreduce APIs that break binary
> > >> compatibility. In other words, users should recompile their
> applications
> > >> that use mapreduce APIs against MRv2 jars. One notable binary
> > >> incompatibility break is Counter and CounterGroup.
> > >>
> > >> For "Binary Compatibility" I understand that if I had build a MR job
> with
> > >> old *mapred* APIs then they can be run directly on YARN without and
>

Re: Differences between HistoryServer and Yarn TimeLine server?

2014-04-22 Thread Zhijie Shen
In Hadoop 2.4, we have delivered the timeline server at a preview stage,
which actually can serve some generic YARN application history as well as
the framework specific information. Due to the development logistics, we
have created the two concepts: History Server and Timeline Server. To be
simple, you can consider the history server of the service of the generic
YARN application information, while consider the timeline server of the
service of the framework specific information. Importantly, we just have
one daemon, which includes both services, and which we'd like to call
timeline server (unfortunately, the confusing thing is that the command to
start the daemon is "historyserver"). We're going on working on the
timeline server to integrate these two parts, including refactoring the
names.

BTW, if you mean MapReduce JobHistoryServer by HistoryServer, it's a
different daemon, which serves the historic information of MapReduce jobs
only.


On Tue, Apr 22, 2014 at 8:44 PM, sam liu  wrote:

> Hi Experts,
>
> I am confusing on these two concepts. Could you help explain the
> differences?
>
> Thanks!
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Differences between HistoryServer and Yarn TimeLine server?

2014-04-23 Thread Zhijie Shen
Sam,

You're right. We can definitely integrate MapReduce to use the timeline
server to store and serve its specific data, and this is actually our plan.

However, it's a big move, and we still need time to get it done. In
addition, not to disturb the users that are currently relying on JHS for MR
job information, we cannot simply remove JHS from Hadoop.


On Wed, Apr 23, 2014 at 8:15 PM, sam liu  wrote:

> Zhijie,
>
> I am much clear now. Thanks a lot!
>
> As my understanding, besides previous Job History Server, hadoop now has a
> new timeline server which could restore both the generic YARN application
> history and the framework specific information. However, I think the
> timeline server also include the functions of Job History Server, because
> it can store the framework specific information(of course, include
> mapreduce framework). In another words, Job History Server is not necessary
> any more.* If that's the case, why hadoop still include Job History
> Server?*
>
>
> 2014-04-23 12:56 GMT+08:00 Zhijie Shen :
>
>> In Hadoop 2.4, we have delivered the timeline server at a preview stage,
>> which actually can serve some generic YARN application history as well as
>> the framework specific information. Due to the development logistics, we
>> have created the two concepts: History Server and Timeline Server. To be
>> simple, you can consider the history server of the service of the generic
>> YARN application information, while consider the timeline server of the
>> service of the framework specific information. Importantly, we just have
>> one daemon, which includes both services, and which we'd like to call
>> timeline server (unfortunately, the confusing thing is that the command to
>> start the daemon is "historyserver"). We're going on working on the
>> timeline server to integrate these two parts, including refactoring the
>> names.
>>
>> BTW, if you mean MapReduce JobHistoryServer by HistoryServer, it's a
>> different daemon, which serves the historic information of MapReduce jobs
>> only.
>>
>>
>> On Tue, Apr 22, 2014 at 8:44 PM, sam liu  wrote:
>>
>>> Hi Experts,
>>>
>>> I am confusing on these two concepts. Could you help explain the
>>> differences?
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Zhijie Shen
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Differences between HistoryServer and Yarn TimeLine server?

2014-04-24 Thread Zhijie Shen
Ashwin,

YARN-321 focuses on the issue in the scope of generic application history
service, while YARN-1530 covers the framework specific data service. And
yes, the timeline server is going to cover both.

We've not such a Jira before, but it is described in YARN-321's design doc.
Anyway, I open a Jira (MAPREDUCE-5858) to track this issue.


On Wed, Apr 23, 2014 at 11:25 PM, Ashwin Shankar
wrote:

> Hi Zhijie,
> There seems to two umbrella jiras for this - YARN-321 and YARN-1530,can
> you please let me know what is the
> difference ? Is timeline server finally going to be YARN321+YARN1530 ?
>
> You mentioned that MR is going to integrated with timeline server,is there
> a jira I can watch ?
>
> Thanks,
> Ashwin
>
>
> On Wed, Apr 23, 2014 at 10:15 PM, Zhijie Shen wrote:
>
>> Sam,
>>
>> You're right. We can definitely integrate MapReduce to use the timeline
>> server to store and serve its specific data, and this is actually our plan.
>>
>> However, it's a big move, and we still need time to get it done. In
>> addition, not to disturb the users that are currently relying on JHS for MR
>> job information, we cannot simply remove JHS from Hadoop.
>>
>>
>> On Wed, Apr 23, 2014 at 8:15 PM, sam liu  wrote:
>>
>>> Zhijie,
>>>
>>> I am much clear now. Thanks a lot!
>>>
>>> As my understanding, besides previous Job History Server, hadoop now has
>>> a new timeline server which could restore both the generic YARN application
>>> history and the framework specific information. However, I think the
>>> timeline server also include the functions of Job History Server, because
>>> it can store the framework specific information(of course, include
>>> mapreduce framework). In another words, Job History Server is not necessary
>>> any more.* If that's the case, why hadoop still include Job History
>>> Server?*
>>>
>>>
>>> 2014-04-23 12:56 GMT+08:00 Zhijie Shen :
>>>
>>>>  In Hadoop 2.4, we have delivered the timeline server at a preview
>>>> stage, which actually can serve some generic YARN application history as
>>>> well as the framework specific information. Due to the development
>>>> logistics, we have created the two concepts: History Server and Timeline
>>>> Server. To be simple, you can consider the history server of the service of
>>>> the generic YARN application information, while consider the timeline
>>>> server of the service of the framework specific information. Importantly,
>>>> we just have one daemon, which includes both services, and which we'd like
>>>> to call timeline server (unfortunately, the confusing thing is that the
>>>> command to start the daemon is "historyserver"). We're going on working on
>>>> the timeline server to integrate these two parts, including refactoring the
>>>> names.
>>>>
>>>> BTW, if you mean MapReduce JobHistoryServer by HistoryServer, it's a
>>>> different daemon, which serves the historic information of MapReduce jobs
>>>> only.
>>>>
>>>>
>>>> On Tue, Apr 22, 2014 at 8:44 PM, sam liu wrote:
>>>>
>>>>> Hi Experts,
>>>>>
>>>>> I am confusing on these two concepts. Could you help explain the
>>>>> differences?
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Zhijie Shen
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>>
>> --
>> Zhijie Shen
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is ad

Re: Differences between HistoryServer and Yarn TimeLine server?

2014-04-25 Thread Zhijie Shen
1 and 4:
We have thought about that in addition to service application specific
data, the timeline server should accept the web UI plugin from the
application, install it and render the data on the web page according the
application's design, but still need to figure out the plan. Before that,
the application needs to take care of the data rendering itself, or make
use of third-party monitoring service, such as Ambari, which AFAIK, has
integration with the timeline server in the recent release (Tez is
leveraging it). And yes, it's always welcome if somebody want to contribute.

2:
REST APIs are available for accessing both the generic data and framework
specific data. For the API specification, you can temporally look at the
patch in YARN-1876.

3:
In terms of services, they're almost there. The next step would be about
the security, scalability and integration stuff.


On Thu, Apr 24, 2014 at 11:11 PM, Ashwin Shankar
wrote:

> Thanks Zhijie !
> I had few more questions  :
> 1. I played around with the timeline server ui today which showed the
> generic application history details,
> but I couldn't find any page for application specific data. Is the
> expectation that every application
> needs to build their own UI using the exposed REST apis and somehow
> install it with timeline server ?
> Or am I missing something.
> 2. Are there REST apis for accessing both generic and framework specific
> data in 2.4.0 ?
> 3. Is there an approximate timeframe for timeline server to be feature
> complete ?
> 4. Tez doesn't have any job history UI,is there any work being done to
> integrate Tez with timeline server ?
> If not,is the timeline server ready for such integration in case someone
> wants to pick this up ?
>
> Thanks,
> Ashwin
>
>
>
> On Thu, Apr 24, 2014 at 12:00 AM, Zhijie Shen wrote:
>
>> Ashwin,
>>
>> YARN-321 focuses on the issue in the scope of generic application history
>> service, while YARN-1530 covers the framework specific data service. And
>> yes, the timeline server is going to cover both.
>>
>> We've not such a Jira before, but it is described in YARN-321's design
>> doc. Anyway, I open a Jira (MAPREDUCE-5858) to track this issue.
>>
>>
>> On Wed, Apr 23, 2014 at 11:25 PM, Ashwin Shankar <
>> ashwinshanka...@gmail.com> wrote:
>>
>>> Hi Zhijie,
>>> There seems to two umbrella jiras for this - YARN-321 and YARN-1530,can
>>> you please let me know what is the
>>> difference ? Is timeline server finally going to be YARN321+YARN1530 ?
>>>
>>> You mentioned that MR is going to integrated with timeline server,is
>>> there a jira I can watch ?
>>>
>>> Thanks,
>>> Ashwin
>>>
>>>
>>> On Wed, Apr 23, 2014 at 10:15 PM, Zhijie Shen wrote:
>>>
>>>> Sam,
>>>>
>>>> You're right. We can definitely integrate MapReduce to use the timeline
>>>> server to store and serve its specific data, and this is actually our plan.
>>>>
>>>> However, it's a big move, and we still need time to get it done. In
>>>> addition, not to disturb the users that are currently relying on JHS for MR
>>>> job information, we cannot simply remove JHS from Hadoop.
>>>>
>>>>
>>>> On Wed, Apr 23, 2014 at 8:15 PM, sam liu wrote:
>>>>
>>>>> Zhijie,
>>>>>
>>>>> I am much clear now. Thanks a lot!
>>>>>
>>>>> As my understanding, besides previous Job History Server, hadoop now
>>>>> has a new timeline server which could restore both the generic YARN
>>>>> application history and the framework specific information. However, I
>>>>> think the timeline server also include the functions of Job History 
>>>>> Server,
>>>>> because it can store the framework specific information(of course, include
>>>>> mapreduce framework). In another words, Job History Server is not 
>>>>> necessary
>>>>> any more.* If that's the case, why hadoop still include Job History
>>>>> Server?*
>>>>>
>>>>>
>>>>> 2014-04-23 12:56 GMT+08:00 Zhijie Shen :
>>>>>
>>>>>>  In Hadoop 2.4, we have delivered the timeline server at a preview
>>>>>> stage, which actually can serve some generic YARN application history as
>>>>>> well as the framework specific information. Due to the development
>>>>>> logistics, we have created the two concepts: History Server and Timeline
>>&g

Re: Enable PseudoAuthenticator | org.apache.hadoop.security.authentication.client.PseudoAuthenticator

2014-06-13 Thread Zhijie Shen
You can follow the instruction on:
https://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/HttpAuthentication.html

- Zhijie


On Fri, Jun 13, 2014 at 8:13 PM, pmanolov 
wrote:

>  Hi guys ,
> How can I enable the PseudoAuthenticator ? I couldn't find anything in the
> authentication. I am trying to connect oozie and hadoop, but it fails so I
> was thinking that I can make it use the PseudoAuthenticator somehow. I
> couldn't find anything in the documentation how does one sets the security
> authenticator.
>
> Regards,
> Peter
>
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop 1.2

2014-06-18 Thread Zhijie Shen
It depends on which group of APIs your application is using. Please refer
to this doc for details:

http://hadoop.apache.org/docs/r2.4.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html


On Thu, Jun 19, 2014 at 2:24 AM, Mohit Anchlia 
wrote:

> Does hadoop map reduce code compiled against 1.2 works with Yarn?
>
>
> 
>
> org.apache.hadoop
>
> *hadoop*-core
>
> 1.2.1
>
> 
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop 2.2.0 : Job not showing on resource management web UI

2014-06-28 Thread Zhijie Shen
gt; However, the above job does not show up at the web UI :
> http://localhost:8088/cluster/apps
>
> My configurations are below,
> mapred-site.xml :
> 
>
>  
>  mapreduce.framework.name
>  yarn
>  
>
> 
>
> ---
> hdfs-site.xml
>
>
> 
>  
>  dfs.replication
>  1
>  
>
>  
>  dfs.namenode.name.dir
>  file:/home/matmsh/temp/hadoop2/namenode
>  The name of the default file system.
>  
>
>  
>  dfs.datanode.data.dir
>  file:/home/matmsh/temp/hadoop2/datanode
>  The name of the default file system.
>  
>
> 
> 
> yarn-site.xml
> 
>
> 
>
>  
>  yarn.nodemanager.aux-services
>  mapreduce_shuffle
>  
>   
>  yarn.nodemanager.aux-services.mapreduce.shuffle.class
>  org.apache.hadoop.mapred.ShuffleHandler
>  
>
>
> 
>
> 
>
> What other configurations are needed to show the job in the web UI ?
> Thanks in advance for any assistance !
>
> Shing
>
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: HDP hadoop 2.4.1 fails to run mapreduce app

2014-07-24 Thread Zhijie Shen
Would you please change the log level to DEBUG to see what happens when
creating the client protocol provider?

On Thu, Jul 24, 2014 at 2:13 AM, MrAsanjar .  wrote:

> please help;
> i have verified mapre-site.xml => mapreduce.framework.name=yarn
> verified HADOOP_CLASS_PATH in hadoop-env.sh
>
>  /usr/lib/hadoop/bin/hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-*.jar pi 2 5
> ...
> .
> .
> Wrote input for Map #0
> Wrote input for Map #1
> Starting Job
> 14/07/23 13:06:47 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/07/23 13:06:47 INFO mapreduce.Cluster: Failed to use
> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: Error in
> instantiating YarnClient
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
> at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> at
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
> at
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: ulimit for Hive

2014-08-12 Thread Zhijie Shen
+ Hive user mailing list

It should be a better place for your questions.


On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan  wrote:

> Hi,
>
> I’ve been reading a lot of posts about needing to set a high ulimit for
> file descriptors in Hadoop and I think it’s probably the cause of a lot of
> the errors I’ve been having when trying to run queries on larger data sets
> in Hive. However, I’m really confused about how and where to set the limit,
> so I have a number of questions:
>
>1. How high is it recommended to set the ulimit?
>2. What is the difference between soft and hard limits? Which one
>needs to be set to the value from question 1?
>3. For which user(s) do I set the ulimit? If I am running the Hive
>query with my login, do I set my own ulimit to the high value?
>4. Do I need to set this limit for these users on all the machines in
>the cluster? (we have one master node and 6 slave nodes)
>5. Do I need to restart anything after configuring the ulimit?
>
> Thanks in advance,
> Ana
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop 2.5.0 unit tests failures

2014-08-31 Thread Zhijie Shen
Hi Rajat,

It is the situation that some test cases will have kinds of race
conditions, an fail intermittently. In most cases, contributors are working
on a linux box, such that the test case may implicitly make some assumption
that is not valid for other systems.

It is appreciated to report these test failures on Jira, with the
environment details and the exception tracing, such that the community can
investigate the test problems.

Thanks,
Zhijie


On Fri, Aug 29, 2014 at 10:06 AM, Rajat Jain  wrote:

> Hi,
>
> I wanted to know if all the unit tests pass in the hadoop-common project
> across various releases. I have never been able to get a clean run on my
> machine (Centos 6.5 / 4GB RAM / tried both Java 6 and Java 7). I have also
> attached the document which has the failures that I got while running the
> tests.
>
> I ran "mvn clean package install -DskipTests" to compile, and thereafter,
> ran "mvn test" from individual subprojects.
>
> In my company, we have forked Apache Hadoop 2.5.0 and we are planning to
> deploy a nightly unit test run to make sure we don't introduce any
> regressions. Is there a way to get a clean unit-test run, or should I
> disable these tests from our suite? I also read somewhere else that there
> are a few flaky tests as well.
>
> Any help is appreciated.
>
>
> https://docs.google.com/a/qubole.com/spreadsheets/d/1bKCclEA0u9VUZvykgaRj_gBqY4xMxTBIGPJml04TXtE/edit#gid=1215903400
>
> Thanks,
> Rajat
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Job is reported as complete on history server while on console it shows as only half way thru

2014-08-31 Thread Zhijie Shen
Doe you mean multiple application attempts on YARN? One MR job shouldn't
result in multiple YARN applications. Would you please share what is shown
on the JHS and RM web UIs?


On Thu, Aug 28, 2014 at 2:01 PM, S.L  wrote:

> Hi All,
>
> I am running a MRV1 job on Hadoop YARN 2.3.0 cluster , the problem is when
> I submit this job YARN created multiple applications for that submitted job
> , and the last application that is running in YARN is marked as complete
> even as on console its reported as only 58% complete . I have confirmed
> that its also not printing the log statements that its supposed to print
> when the job is actually complete .
>
> Please see the output from the job submission console below. It just stops
> at 58% and job history server and YARN cluster UI reports that this job has
> already succeeded.
>
> 4/08/28 08:36:19 INFO mapreduce.Job:  map 54% reduce 0%
> 14/08/28 08:44:13 INFO mapreduce.Job:  map 55% reduce 0%
> 14/08/28 08:52:16 INFO mapreduce.Job:  map 56% reduce 0%
> 14/08/28 08:59:22 INFO mapreduce.Job:  map 57% reduce 0%
> 14/08/28 09:07:33 INFO mapreduce.Job:  map 58% reduce 0%
>
> Thanks.
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Any issue with large concurrency due to single active instance of YARN Resource Manager?

2014-09-02 Thread Zhijie Shen
Hi Bo,

RM doesn't create an individual thread for each running app. The app life
cycle management is event driven. There's a dispatcher, which runs on one
thread to handle the events for all apps.

Zhijie


On Mon, Sep 1, 2014 at 11:39 PM, bo yang  wrote:

> Hi Guys,
>
> I am thinking how many concurrent jobs a single Resource Manager might be
> able to manage? Following is my understanding, please correct me if I am
> wrong.
>
> Let's say if we have 1000 concurrent jobs running. Resource Manager will
> have 1000 records in memory to manage these jobs. And it will also have
> 1000 threads, where each thread is waiting for one job to finish.
>
> The memory part will probably be ok. For the 1000 threads, will there be
> any potential problem?
>
> Thanks,
> Bo
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Any issue with large concurrency due to single active instance of YARN Resource Manager?

2014-09-02 Thread Zhijie Shen
Hi Bo,

I don't have the exact number about the max concurrent job. FYI, RM is
multip-threaded, but the threads are working for different purpose. For
example, scheduler and rmstatestore has their separate thread, and RPC
calls are on individual threads as well. It's complicated to evaluate the
upper bound of concurrent apps, but I've heard of the YARN cluster
deployment on a cluster of thousands of nodes.

Thanks,
Zhijie


On Tue, Sep 2, 2014 at 10:42 AM, bo yang  wrote:

> Hi Zhijie,
>
> That is great to know. Thanks!
>
> So there seems no be much limit to support large concurrency. To move this
> question further, what might be the max number of concurrent jobs which one
> Resource Manager could support? Is there any numbers from your experience?
>
> Thanks,
> Bo
>
>
>
>
>
>
> On Tue, Sep 2, 2014 at 12:10 AM, Zhijie Shen 
> wrote:
>
>> Hi Bo,
>>
>> RM doesn't create an individual thread for each running app. The app life
>> cycle management is event driven. There's a dispatcher, which runs on one
>> thread to handle the events for all apps.
>>
>> Zhijie
>>
>>
>> On Mon, Sep 1, 2014 at 11:39 PM, bo yang  wrote:
>>
>>> Hi Guys,
>>>
>>> I am thinking how many concurrent jobs a single Resource Manager might
>>> be able to manage? Following is my understanding, please correct me if I am
>>> wrong.
>>>
>>> Let's say if we have 1000 concurrent jobs running. Resource Manager will
>>> have 1000 records in memory to manage these jobs. And it will also have
>>> 1000 threads, where each thread is waiting for one job to finish.
>>>
>>> The memory part will probably be ok. For the 1000 threads, will there be
>>> any potential problem?
>>>
>>> Thanks,
>>> Bo
>>>
>>
>>
>>
>> --
>> Zhijie Shen
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Yarn timeline server issue

2015-04-23 Thread Zhijie Shen
?Hi Udit,


Do you have full exception stack? Perhaps you have run into the same issue: 
YARN-3393.

?
Thanks,

Zhijie


From: Udit Mehta 
Sent: Thursday, April 23, 2015 12:08 PM
To: user@hadoop.apache.org
Subject: Yarn timeline server issue

Hi,

I have a yarn cluster setup with HDP 2.2. I have also enabled resource manager 
HA due to which the timeline server seems to fail everytime there is a 
failover. Is this a known issue or am I doing something wrong?
I am unable to access the servlet "/applicationHistory" on the timeline server 
and these are the logs I see:
2015-04-23 18:20:55,380 ERROR webapp.View (AppsBlock.java:render(90)) - Failed 
to read the applications.
java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80)
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)

Does anyone know what could be the issue?

Thanks,
Udit


Re: Is the application tracking URL changed expected ?

2015-05-06 Thread Zhijie Shen
?This is a known bug, but I thought it has been fixed by YARN-2246.


Thanks,

Zhijie


From: Jeff Zhang 
Sent: Wednesday, May 06, 2015 6:23 AM
To: user@hadoop.apache.org
Subject: Is the application tracking URL changed expected ?


I run the distributed shell example, and the tracking url will change after app 
move to RUNNING state. As the following log shows that after the app move to 
RUNNING state, there's one "A" suffix in the trackingURL. But it looks like the 
suffix "A" is not used. Because even I change the suffix to any other words, it 
will redirect me to the right app report url. So I'm not sure the purpose of 
the tracking URL suffix ? Is this by design or a bug ?



15/05/06 21:14:53 INFO distributedshell.Client: Got application report from ASM 
for, appId=3, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, 
appQueue=default, appMasterRpcPort=-1, appStartTime=1430918063457, 
yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, 
appTrackingUrl=http://localhost:8088/proxy/application_1430916889869_0003/, 
appUser=jzhang
15/05/06 21:14:54 INFO distributedshell.Client: Got application report from ASM 
for, appId=3, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, 
appQueue=default, appMasterRpcPort=-1, appStartTime=1430918063457, 
yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, 
appTrackingUrl=http://localhost:8088/proxy/application_1430916889869_0003/, 
appUser=jzhang
15/05/06 21:14:55 INFO distributedshell.Client: Got application report from ASM 
for, appId=3, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, 
appQueue=default, appMasterRpcPort=-1, appStartTime=1430918063457, 
yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, 
appTrackingUrl=http://localhost:8088/proxy/application_1430916889869_0003/, 
appUser=jzhang
15/05/06 21:14:56 INFO distributedshell.Client: Got application report from ASM 
for, appId=3, clientToAMToken=null, appDiagnostics=, 
appMasterHost=jzhangMBPr.local/127.0.0.1, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1430918063457, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://localhost:8088/proxy/application_1430916889869_0003/A, 
appUser=jzhang
15/05/06 21:14:57 INFO distributedshell.Client: Got application report from ASM 
for, appId=3, clientToAMToken=null, appDiagnostics=, 
appMasterHost=jzhangMBPr.local/127.0.0.1, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1430918063457, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=http://localhost:8088/proxy/application_1430916889869_0003/A, 
appUser=jzhang

--
Best Regards

Jeff Zhang


Re: Lost mapreduce applications displayed in UI

2015-05-12 Thread Zhijie Shen
?Maybe you have hit the completed app limit (1 by default). Once the limit 
hits, the oldest completed app will be removed from cache.


- Zhijie


From: hitarth trivedi 
Sent: Tuesday, May 12, 2015 3:32 PM
To: user@hadoop.apache.org
Subject: Lost mapreduce applications displayed in UI

Hi,

My cluster suddenly stopped displaying application information in UI 
(http://localhost:8088/cluster/apps). Although the counters like 'Apps 
Submitted' , 'Apps Completed', 'Apps Running'  etc, all seems to increment 
accurately and display right information, whnever I start new mapreduce job.

Any help is appreciated.

Thanks,
Hitrix


Re: Using UserGroupInformation in multithread process

2015-06-19 Thread Zhijie Shen
Do you mean TGT tickets or tokens? Anyway, they should be across threads. Did 
you check if you're using the same UGI object in different threads?


Thanks,

Zhijie


From: Gaurav Gupta 
Sent: Thursday, June 18, 2015 11:35 PM
To: user@hadoop.apache.org
Subject: Using UserGroupInformation in multithread process

I am using UserGroupInformation to get the Kerberos tokens.
I have a process in a Yarn container that is spawning another thread (slave). I 
am renewing the Kerberos Tokens in master thread but the slave thread is still 
using older Tokens.
Are tokens not shared across threads in same JVM?

Thanks
Gaurav


Re: ResourceManager crashes with the unnoticeable error.

2015-07-01 Thread Zhijie Shen
?That's just some warnings from web component. It should do harm to your RM. 
You should check the RM log. Check if you defined HADOOP_YARN_HOME/logs or 
YARN_LOG_DIR?, where the daemon log lives.


Thanks,

Zhijie


From: xeonmailinglist-gmail 
Sent: Wednesday, July 01, 2015 10:18 AM
To: user@hadoop.apache.org
Cc: Ted Yu
Subject: Re: ResourceManager crashes with the unnoticeable error.

I have no file /var/log/messages.

I am using hadoop-2.6.0

Wellington:~/repositories/git/hadoop-2.6.0$ ./sbin/start-yarn.sh


On 07/01/2015 05:56 PM, Ted Yu wrote:
Can you check /var/log/messages to see if there is some clue ?

Which hadoop release are you using ?

Can you provide the command line for the resource manager ?

Thanks

On Wed, Jul 1, 2015 at 9:38 AM, xeonmailinglist-gmail 
mailto:xeonmailingl...@gmail.com>> wrote:

I am running the hadoop MRv2 in a cluster with 4 nodes. Java 8 is installed.
I start the resource manager and the node manager normally, but during the 
execution the resource manager crashes with the error below. Any help to solve 
this? Is it a problem related to java heap, or memory?


Jul 01, 2015 12:21:05 PM 
com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
 get
WARNING: You are attempting to use a deprecated API (specifically, attempting 
to @Inject ServletContext inside an eagerly created singleton. While we allow 
this for backwards compatibility, be warned that this MAY have unexpected 
behavior if you have more than one injector (with ServletModule) running in the 
same JVM. Please consult the Guice documentation at 
http://code.google.com/p/google-guice/wiki/Servlets for more information.
Jul 01, 2015 12:21:06 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering 
org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver as a 
provider class
Jul 01, 2015 12:21:06 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices as a root 
resource class
Jul 01, 2015 12:21:06 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a 
provider class
Jul 01, 2015 12:21:06 PM 
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Jul 01, 2015 12:21:06 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding 
org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver to 
GuiceManagedComponentProvider with the scope "Singleton"
Jul 01, 2015 12:21:07 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to 
GuiceManagedComponentProvider with the scope "Singleton"
Jul 01, 2015 12:21:08 PM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices to 
GuiceManagedComponentProvider with the scope "Singleton"


?

--
Thanks,



--
--


Re: accessing hadoop job history

2015-07-01 Thread Zhijie Shen
?output means the path to the history file of the job you want to view on hdfs.


Thanks,

Zhijie


From: mehdi benchoufi 
Sent: Saturday, June 20, 2015 12:09 PM
To: user@hadoop.apache.org
Subject: accessing hadoop job history

Hi,

I ma new to Hadoop and when I run

hadoop job -history output

I get this

Ignore unrecognized file: output
Exception in thread "main" java.io.IOException: Unable to initialize 
History Viewer
at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:90)
at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:487)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:330)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1237)
Caused by: java.io.IOException: Unable to initialize History Viewer
at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:84)
... 5 more


I checked the logs (history server logs, 
`mapred-*username*-historyserver-**.local.log` ), and there are empty. How can 
I solve it ?

Best regards,
Mehdi


Re: Problem when configure the security in hadoop

2015-07-07 Thread Zhijie Shen
?Not sure about HDFS special setup, but in general, to use HTTPs, you should 
have your keystore/truststore generated and config ssl-client.xml and 
ssl-server.xml properly.


- Zhijie


From: Colin Ma 
Sent: Friday, July 03, 2015 1:37 AM
To: user@hadoop.apache.org
Subject: Problem when configure the security in hadoop

Hi,
 I do the security configuration for Hadoop these days, the Kerberos 
works fine, but there maybe has some problems on sasl configuration.
 The following is the related configuration in hdfs-site.xml:

  dfs.http.policy
  HTTPS_ONLY


  dfs.data.transfer.protection
  authentication


There is no problem to execute the command like:   hdfs dfs -ls /
But when I execute the command:   hdfs dfs -copyToLocal /temp/test.txt .  The 
following exception will be thrown:

 015-07-03 14:02:54,715 INFO 
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added 
bpid=BP-271423801-192.168.20.28-1423724265164 to blockPoolScannerMap, new size=1
2015-07-03 14:03:39,963 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
server-511:50010:DataXceiver error processing unknown operation  src: 
/192.168.20.28:58422 dst: 
/192.168.20.28:50010
java.io.EOFException: Premature EOF: no length prefix available
 at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiationCipherOptions(DataTransferSaslUtil.java:233)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:369)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:297)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:124)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183)
 at java.lang.Thread.run(Thread.java:745)
2015-07-03 15:34:39,917 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Sent 1 blockreports 145 blocks total. Took 1 msec to generate and 6 msecs for 
RPC and NN processing.  Got back commands 
org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@1b3bce82

 Just take a look the method doSaslHandshake() of 
SaslDataTransferClient.java and SaslDataTransferServer.java, maybe 
SaslDataTransferClient send a empty response cause this exception, and I think 
some mistakes in the configuration caused this problem.
Is there anyone can help to check this problem?
Thanks for your help.

Best regards,

Colin Ma