Re: Regarding containers not launching

2018-01-30 Thread nishchay malhotra
yes my job has about 160,000 maps and my cluster not getting fully utilized
around 6000 maps ran for 2 hrs and then I killed the job. At any point of
time only 40 containers are running thats just 11% of my cluster capacity.

{
"classification": "mapred-site",
"properties": {
  "mapreduce.job.reduce.slowstart.completedmaps":"1",
  "mapreduce.reduce.memory.mb": "3072",
  "mapreduce.map.memory.mb": "2208",
  "mapreduce.map.java.opts":"-Xmx1800m",
  "mapreduce.map.cpu.vcores":"1"
}
  },
  {
  "classification": "yarn-site",
  "properties": {
"yarn.scheduler.minimum-allocation-mb": "32”,
“yarn.scheduler.maximum-allocation-mb”:”253952”,
“yarn.scheduler.maximum-allocation-vcores: “128”

"yarn.nodemanager.vmem-pmem-ratio":"3",
"yarn.nodemanager.vmem-check-enabled":"true",
 yarn.nodemanager.resource.cpu-vcores" ; "16”,
 yarn.nodemanager.resource.memory-mb: “23040"
  }

Each node: capacity
Disk-space=100gb
memory=28gb
processors: 8


Kerberos impersonation question

2018-01-30 Thread Bear Giles
Back with a Kerberos impersonation question. The hadoop.proxyuser.*
settings are correct, at least the same settings worked on a different
cluster that doesn't require Kerberos authentication.

I can perform my action as the basic user.

When I use the same UGI code, add

  user = UGI.createProxy("new user", user);

and attempt to perform the same action I get:

java.io.IOException: Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "
cdhclusterqa-2-2.clouddev.snaplogic.com/10.164.199.241"; destination host
is: "cdhclusterqa-2-1.clouddev.snaplogic.com":8020;

Nothing else has changed. Literally - it's a checkbox toggle that does
nothing but conditionally call the code in blue.

Any ideas? I did a 'relogin from keytab file' with the original user -
would I need to do that after the proxy call?

(Hmm... I'm not familiar with this code but looking at the stack trace I
realize that the HDFS call is being made in a separate thread from the one
that acquired the original UGI credentials. The thread is created in a
privileged action so it has the basic information but may not have all
threadlocal information. I don't know why that decision was made. It's
suspicious... but the basic Kerberos authentication works. It's the
impersonation that's failing.)

FWIW the bottommost few exceptions are:

  exc: java.io.IOException: Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]; Host Details│
  exc:  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)

 │
  exc:  at org.apache.hadoop.ipc.Client.call(Client.java:1480)

 │
  exc:  at org.apache.hadoop.ipc.Client.call(Client.java:1407)

 │
  exc:  at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)

  │
  exc:  at com.sun.proxy.$Proxy91.getFileInfo(Unknown Source)

  │
  exc:  at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
   │
  exc:  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 │
  exc:  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   │
  exc:  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   │
  exc:  at java.lang.reflect.Method.invoke(Method.java:497)

  │
  exc:  at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)

 │
  exc:  at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)

   │
  exc:  at com.sun.proxy.$Proxy92.getFileInfo(Unknown Source)

  │
  exc:  at
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2113)

   │
  exc:  at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)

 │
  exc:  at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)

 │
  exc:  at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

 │
  exc:  at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)

 │
  exc:  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)

  │
  exc:  at
com.snaplogic.snap.api.fs.hdfs.HdfsUrlConnection.attemptHdfsCreate(HdfsUrlConnection.java:227)

  │
  exc:  at
com.snaplogic.snap.api.fs.hdfs.HdfsUrlConnection.access$500(HdfsUrlConnection.java:62)

  │
  exc:  at
com.snaplogic.snap.api.fs.hdfs.HdfsUrlConnection$3.run(HdfsUrlConnection.java:196)

  │
  exc:  at
com.snaplogic.snap.api.fs.hdfs.HdfsUrlConnection$3.run(HdfsUrlConnection.java:191)

  │
  exc:  at java.security.AccessController.doPrivileged(Native Method)

  │
  exc:  at javax.security.auth.Subject.doAs(Subject.java:422)

  │
  exc:  at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

│
  exc:  at
com.snaplogic.snap.api.fs.hdfs.HdfsUrlConnection.getOutputStream(HdfsUrlConnection.java:190)

│
  exc:  at
com.snaplogic.snap.api.binary.SimpleWriter$GetOutputStream.call(SimpleWriter.java:145)

  │
  exc:  at
com.snaplogic.snap.api.binary.SimpleWriter$GetOutputStream.call(SimpleWriter.java:136)

  │
  exc:  at 

Re: Can't launch Flink on Hadoop cluster: Call From localhost.localdomain/127.0.0.1 to localhost:36063 failed

2018-01-30 Thread Julio Biason
Oh, forgot to mention: The application is loaded successfully:

2018-01-30 17:28:38,157 INFO
org.apache.flink.yarn.YarnClusterDescriptor   - Submitting
application master application_1517332236216_0002
2018-01-30 17:28:38,582 INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
application application_1517332236216_0002
2018-01-30 17:28:38,582 INFO
org.apache.flink.yarn.YarnClusterDescriptor   - Waiting for
the cluster to be allocated
2018-01-30 17:28:38,711 INFO
org.apache.flink.yarn.YarnClusterDescriptor   - Deploying
cluster, current state ACCEPTED

On Tue, Jan 30, 2018 at 3:38 PM, Julio Biason 
wrote:

> Hi,
>
> I'm trying to launch Flink as a long-running Yarn app (on Hadoop 2.8.3),
> which should be simple as running `./yarn-session.sh`, but something is not
> working here.
>
> When I run said command, I get the following error:
>
> Error while deploying YARN cluster: Couldn't deploy Yarn session cluster
> [...]
> Diagnostics from YARN: Application application_1517332236216_0002 failed 1
> times (global limit =2; local limit is =1) due to Error launching
> appattempt_1517332236216_0002_01. Got exception:
> java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to
> localhost:36063 failed on connection exception: java.net.ConnectException:
> Connection refused; For more details see:  http://wiki.apache.org/hadoop/
> ConnectionRefused
>
> So here is the weird part: I'm just running namenode, datanode on that
> server. I want it to be the master machine so I'm not running
> resourcemanager on it. There is another machine running resourcemanager and
> I did the whole shebang to make it visible to the master node.
>
> So I'm not sure why it's trying to connect to localhost or even who should
> be listening on port 36063. At first I thought it was a problem with the
> Flink package (dunno, maybe they hardcoded that port in their package or
> something) but it really seems to be a problem with my install (and I have
> absolutely no more ideas on what to look for).
>
> Anything else I should be looking for?
>
> --
> *Julio Biason*, Sofware Engineer
> *AZION*  |  Deliver. Accelerate. Protect.
> Office: +55 51 3083 8101   |  Mobile: +55 51
> *99907 0554*
>



-- 
*Julio Biason*, Sofware Engineer
*AZION*  |  Deliver. Accelerate. Protect.
Office: +55 51 3083 8101   |  Mobile: +55 51
*99907 0554*


Can't launch Flink on Hadoop cluster: Call From localhost.localdomain/127.0.0.1 to localhost:36063 failed

2018-01-30 Thread Julio Biason
Hi,

I'm trying to launch Flink as a long-running Yarn app (on Hadoop 2.8.3),
which should be simple as running `./yarn-session.sh`, but something is not
working here.

When I run said command, I get the following error:

Error while deploying YARN cluster: Couldn't deploy Yarn session cluster
[...]
Diagnostics from YARN: Application application_1517332236216_0002 failed 1
times (global limit =2; local limit is =1) due to Error launching
appattempt_1517332236216_0002_01. Got exception:
java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to
localhost:36063 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused

So here is the weird part: I'm just running namenode, datanode on that
server. I want it to be the master machine so I'm not running
resourcemanager on it. There is another machine running resourcemanager and
I did the whole shebang to make it visible to the master node.

So I'm not sure why it's trying to connect to localhost or even who should
be listening on port 36063. At first I thought it was a problem with the
Flink package (dunno, maybe they hardcoded that port in their package or
something) but it really seems to be a problem with my install (and I have
absolutely no more ideas on what to look for).

Anything else I should be looking for?

-- 
*Julio Biason*, Sofware Engineer
*AZION*  |  Deliver. Accelerate. Protect.
Office: +55 51 3083 8101   |  Mobile: +55 51
*99907 0554*


Re: Regarding containers not launching

2018-01-30 Thread Billy Watson
Is your job able to use more containers, I.e. does your job have tasks
waiting or are all tasks in progress?

William Watson


On Tue, Jan 30, 2018 at 1:56 AM, nishchay malhotra <
nishchay.malht...@gmail.com> wrote:

> What should I be looking for if my 24-node cluster in not launching enough
> containers?
> only 40/288 cores are used and 87GB/700GB is memory is used.
> Yarn.nodemanager memory/core conf look good. And so do container
> memory/core conf.
>
> Thanks
> Nishchay Malhotra
>


Need your input for a research survey-Waste in Software Engineering Organisations

2018-01-30 Thread Hiva Alahyari
Dear Software Developers!

I’m writing to you as I need your insight for my research study, if you could 
kindly participate in my survey.

As part of my research project at Chalmers University, I have conducted a 
research-survey about 23 types of wastes (collected from literature and though 
studies) that software developers or anyone involved in software development 
activities, might experience in their work/ organizations.

"In Lean thinking, anything that doesn’t add value is considered Waste."

Now I really need your input to help us understand what wastes are the most 
important according to you and your experiences.

So, I really appreciate if you participate in my survey (15 mins 
approximately), and also spread it amongst your team/colleagues, and help us 
correct and improve our research.
https://goo.gl/forms/apdBnDLXecV4XIs93

I hope that you will also find it interesting reading on these wastes and to 
see which ones occur in your work and/or organization.

Lots of thanks in advance for your help and support! :)

Cheers,
/Hiva