Building with Hadoop 3

2019-12-03 Thread Foster, Craig
Hi:
I don’t see a JIRA for Hadoop 3 support. I see a comment on a JIRA here from a 
year ago that no one is looking into Hadoop 3 support [1]. Is there a document 
or JIRA that now exists which would point to what needs to be done to support 
Hadoop 3? Right now builds with Hadoop 3 don’t work obviously because there’s 
no flink-shaded-hadoop-3 artifacts.

Thanks!
Craig

[1] https://issues.apache.org/jira/browse/FLINK-11086



Re: Flink Kinesis connector in 1.3.0

2017-06-13 Thread Foster, Craig
Oh, sorry. I’m not using distributed libraries but trying to build from source. 
So, using Maven 3.2.2 and building the connector doesn’t give me a jar for some 
reason.

From: Chesnay Schepler <ches...@apache.org>
Date: Tuesday, June 13, 2017 at 1:44 PM
To: "Foster, Craig" <foscr...@amazon.com>, "user@flink.apache.org" 
<user@flink.apache.org>, Robert Metzger <rmetz...@apache.org>
Subject: Re: Flink Kinesis connector in 1.3.0

Here's the relevant JIRA: https://issues.apache.org/jira/browse/FLINK-6812

Apologies if I was unclear, i meant that you could use the 1.3-SNAPSHOT version 
of the kinesis connector, as it is compatible with 1.3.0.
Alternatively you can take the 1.3.0 sources and build the connector manually.

As far as I'm aware there is no plan to retroactively release a 1.3.0 artifact.


I'm not aware of the missing httpclient dependency, pulling in Robert who may 
know something about it.


On 13.06.2017 21:00, Foster, Craig wrote:
So, in addition to the question below, can we be more clear on if there is a 
patch/fix/JIRA available since I have to use 1.3.0?

From: "Foster, Craig" <foscr...@amazon.com><mailto:foscr...@amazon.com>
Date: Tuesday, June 13, 2017 at 9:27 AM
To: Chesnay Schepler <ches...@apache.org><mailto:ches...@apache.org>, 
"user@flink.apache.org"<mailto:user@flink.apache.org> 
<user@flink.apache.org><mailto:user@flink.apache.org>
Subject: Re: Flink Kinesis connector in 1.3.0

Thanks! Does this also explain why commons HttpClient is not included in 
flink-dist-*?

From: Chesnay Schepler <ches...@apache.org><mailto:ches...@apache.org>
Date: Tuesday, June 13, 2017 at 8:53 AM
To: "user@flink.apache.org"<mailto:user@flink.apache.org> 
<user@flink.apache.org><mailto:user@flink.apache.org>
Subject: Re: Flink Kinesis connector in 1.3.0

Something went wrong during the release process which prevented the 1.3.0 
kinesis artifact from being released.

This will be fixed for 1.3.1, in the mean time you can use 1.3.0-SNAPSHOT 
instead.

On 13.06.2017 17:48, Foster, Craig wrote:
Hi:
I’m trying to build an application that uses the Flink Kinesis Connector in 
1.3.0. However, I don’t see that resolving anymore. It resolved with 1.2.x but 
doesn’t with 1.3.0. Is there something I need to now do differently than 
described here?
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kinesis.html

Thanks,
Craig






Re: Flink Kinesis connector in 1.3.0

2017-06-13 Thread Foster, Craig
So, in addition to the question below, can we be more clear on if there is a 
patch/fix/JIRA available since I have to use 1.3.0?

From: "Foster, Craig" <foscr...@amazon.com>
Date: Tuesday, June 13, 2017 at 9:27 AM
To: Chesnay Schepler <ches...@apache.org>, "user@flink.apache.org" 
<user@flink.apache.org>
Subject: Re: Flink Kinesis connector in 1.3.0

Thanks! Does this also explain why commons HttpClient is not included in 
flink-dist-*?

From: Chesnay Schepler <ches...@apache.org>
Date: Tuesday, June 13, 2017 at 8:53 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: Flink Kinesis connector in 1.3.0

Something went wrong during the release process which prevented the 1.3.0 
kinesis artifact from being released.

This will be fixed for 1.3.1, in the mean time you can use 1.3.0-SNAPSHOT 
instead.

On 13.06.2017 17:48, Foster, Craig wrote:
Hi:
I’m trying to build an application that uses the Flink Kinesis Connector in 
1.3.0. However, I don’t see that resolving anymore. It resolved with 1.2.x but 
doesn’t with 1.3.0. Is there something I need to now do differently than 
described here?
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kinesis.html

Thanks,
Craig




Re: Flink Kinesis connector in 1.3.0

2017-06-13 Thread Foster, Craig
Thanks! Does this also explain why commons HttpClient is not included in 
flink-dist-*?

From: Chesnay Schepler <ches...@apache.org>
Date: Tuesday, June 13, 2017 at 8:53 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: Flink Kinesis connector in 1.3.0

Something went wrong during the release process which prevented the 1.3.0 
kinesis artifact from being released.

This will be fixed for 1.3.1, in the mean time you can use 1.3.0-SNAPSHOT 
instead.

On 13.06.2017 17:48, Foster, Craig wrote:
Hi:
I’m trying to build an application that uses the Flink Kinesis Connector in 
1.3.0. However, I don’t see that resolving anymore. It resolved with 1.2.x but 
doesn’t with 1.3.0. Is there something I need to now do differently than 
described here?
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kinesis.html

Thanks,
Craig




Re: How to run a Flink job in EMR?

2017-06-07 Thread Foster, Craig
Ah, maybe (1) wasn’t entirely clear so here’s the copy/pasted example with what 
I suggested:

HadoopJarStepConfig copyJar = new HadoopJarStepConfig()
  .withJar("command-runner.jar")
  .withArgs("bash","-c", "aws s3 cp s3://mybucket/myjar.jar /home/hadoop"
);


From: "Foster, Craig" <foscr...@amazon.com>
Date: Wednesday, June 7, 2017 at 7:21 PM
To: Chris Schneider <cschnei...@scaleunlimited.com>, "user@flink.apache.org" 
<user@flink.apache.org>
Subject: Re: How to run a Flink job in EMR?


1)  Since the jar is only required on the master node you should be able to 
just run a step with a very simple script like ‘bash –c “aws s3 cp 
s3://mybucket/myjar.jar .”’
So if you were to do that using the step similar to outlined in the EMR 
documentation, but replacing withArgs with the above command as args (I think 
there’s an example of this on that same EMR docs page you refer to).
Then add another step after that which actually runs the flink job. The jar 
will be located in /home/hadoop. In the future, I’m hoping this can just be 
simplified to flink run -yn 2 -p 4 s3://mybucket/myjar.jar … but it doesn’t 
seem to be the case right now.

2)  If you ran this as a step, you should be able to see the error the 
Flink driver gives in the step’s logs.

3)  Provided your S3 bucket and EMR cluster EC2 IAM role/”instance profile” 
belong to the same account (or at least the permissions are setup such that you 
can download a file from S3 to your EC2 instances), you should be able to use 
the 
DefaultAWSCredentialsProviderChain<http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html>,
 which won’t require you enter any credentials as it uses the EC2 instance 
profile credentials provider.


Hope that helps.

Thanks,
Craig


From: Chris Schneider <cschnei...@scaleunlimited.com>
Date: Wednesday, June 7, 2017 at 6:16 PM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: How to run a Flink job in EMR?

Hi Gang,

I’ve been trying to get some Flink code running in Amazon Web Services’s 
Elastic MapReduce, but so far the only success I’ve had required me to log into 
the master node, download my jar from S3 to there, and then run it on the 
master node from the command line using something like the following:

% bin/flink run -m yarn-cluster -yn 2 -p 4  

The two other approaches I’ve tried (based on the AWS EMR Flink 
documentation<http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html>)
 that didn’t work were:

1) Add an EMR Step to launch my program as part of a Flink session - I couldn’t 
figure out how to get my job jar deployed as part of the step, and I couldn’t 
successfully configure a Bootstrap 
Action<http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html#bootstrapCustom>
 to deploy it before running that step.

2) Start a Long-Running Flink Session via an EMR Step (which worked) and then 
use the Flink Web UI to upload my job jar from my workstation - It killed the 
ApplicationMaster that was running the Flink Web UI without providing much 
interesting logging. I’ve appended both the container log output and the 
jobmanager.log contents to the end of this email.
In addition, it would be nice to gain access to S3 resources using credentials. 
I’ve tried using an AmazonS3ClientBuilder, and passing an 
EnvironmentVariableCredentialsProvider<http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EnvironmentVariableCredentialsProvider.html>
 to its setCredentials method. I’d hoped that this might pick up the 
credentials I set up on my master node in the $AWS_ACCESS_KEY_ID and 
$AWS_SECRET_KEY environment variables I've exported, but I’m guessing that the 
shell this code is running in (on the slaves?) doesn’t have access to those 
variables.
Here’s a list of interesting version numbers:

flink-java-1.2.0.jar
flink-core-1.2.0.jar
flink-annotations-1.2.0.jar
emr-5.4.0 with Flink 1.2.0 installed

Any help would be greatly appreciated. I’m lusting after an example showing how 
to deploy a simple Flink jar from S3 to a running EMR cluster and then get 
Flink to launch it with an arbitrary set of Flink and user arguments. Bonus 
points for setting up an AmazonS3 Java client object without including those 
credentials within my Java source code.

Best Regards,

- Chris

Here’s the container logging from my attempt to submit my job via the Flink web 
UI:
Application application_1496707031947_0002 failed 1 times due to AM Container 
for appattempt_1496707031947_0002_01 exited with exitCode: 255
For more detailed output, check application tracking 
page:http://ip-10-85-61-122.ec2.internal:8088/cluster/app/application_1496707031947_0002
 Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1496707

Re: How to run a Flink job in EMR?

2017-06-07 Thread Foster, Craig
1)   Since the jar is only required on the master node you should be able 
to just run a step with a very simple script like ‘bash –c “aws s3 cp 
s3://mybucket/myjar.jar .”’
So if you were to do that using the step similar to outlined in the EMR 
documentation, but replacing withArgs with the above command as args (I think 
there’s an example of this on that same EMR docs page you refer to).
Then add another step after that which actually runs the flink job. The jar 
will be located in /home/hadoop. In the future, I’m hoping this can just be 
simplified to flink run -yn 2 -p 4 s3://mybucket/myjar.jar … but it doesn’t 
seem to be the case right now.

2)   If you ran this as a step, you should be able to see the error the 
Flink driver gives in the step’s logs.

3)   Provided your S3 bucket and EMR cluster EC2 IAM role/”instance 
profile” belong to the same account (or at least the permissions are setup such 
that you can download a file from S3 to your EC2 instances), you should be able 
to use the 
DefaultAWSCredentialsProviderChain,
 which won’t require you enter any credentials as it uses the EC2 instance 
profile credentials provider.


Hope that helps.

Thanks,
Craig


From: Chris Schneider 
Date: Wednesday, June 7, 2017 at 6:16 PM
To: "user@flink.apache.org" 
Subject: How to run a Flink job in EMR?

Hi Gang,

I’ve been trying to get some Flink code running in Amazon Web Services’s 
Elastic MapReduce, but so far the only success I’ve had required me to log into 
the master node, download my jar from S3 to there, and then run it on the 
master node from the command line using something like the following:

% bin/flink run -m yarn-cluster -yn 2 -p 4  

The two other approaches I’ve tried (based on the AWS EMR Flink 
documentation)
 that didn’t work were:

1) Add an EMR Step to launch my program as part of a Flink session - I couldn’t 
figure out how to get my job jar deployed as part of the step, and I couldn’t 
successfully configure a Bootstrap 
Action
 to deploy it before running that step.

2) Start a Long-Running Flink Session via an EMR Step (which worked) and then 
use the Flink Web UI to upload my job jar from my workstation - It killed the 
ApplicationMaster that was running the Flink Web UI without providing much 
interesting logging. I’ve appended both the container log output and the 
jobmanager.log contents to the end of this email.
In addition, it would be nice to gain access to S3 resources using credentials. 
I’ve tried using an AmazonS3ClientBuilder, and passing an 
EnvironmentVariableCredentialsProvider
 to its setCredentials method. I’d hoped that this might pick up the 
credentials I set up on my master node in the $AWS_ACCESS_KEY_ID and 
$AWS_SECRET_KEY environment variables I've exported, but I’m guessing that the 
shell this code is running in (on the slaves?) doesn’t have access to those 
variables.
Here’s a list of interesting version numbers:

flink-java-1.2.0.jar
flink-core-1.2.0.jar
flink-annotations-1.2.0.jar
emr-5.4.0 with Flink 1.2.0 installed

Any help would be greatly appreciated. I’m lusting after an example showing how 
to deploy a simple Flink jar from S3 to a running EMR cluster and then get 
Flink to launch it with an arbitrary set of Flink and user arguments. Bonus 
points for setting up an AmazonS3 Java client object without including those 
credentials within my Java source code.

Best Regards,

- Chris

Here’s the container logging from my attempt to submit my job via the Flink web 
UI:
Application application_1496707031947_0002 failed 1 times due to AM Container 
for appattempt_1496707031947_0002_01 exited with exitCode: 255
For more detailed output, check application tracking 
page:http://ip-10-85-61-122.ec2.internal:8088/cluster/app/application_1496707031947_0002
 Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1496707031947_0002_01_01
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at 

Re: Alternatives to FLINK_CONF_DIR, HADOOP_CONF_DIR, YARN_CONF_DIR

2017-03-23 Thread Foster, Craig
Thanks! I looked in the config.sh file that sort of works with both the 
configuration file and these environment variables. After inspection, it 
doesn’t make sense to set FLINK_CONF_DIR in that config file since that 
location determines where we would look for that file.

However, I thought that I could add a patch to this shell script to at least 
allow someone to set the Hadoop and Yarn configuration directories in the 
config file.

From: "Bajaj, Abhinav" <abhinav.ba...@here.com>
Date: Thursday, March 23, 2017 at 10:42 AM
To: "Foster, Craig" <foscr...@amazon.com>
Cc: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: Alternatives to FLINK_CONF_DIR, HADOOP_CONF_DIR, YARN_CONF_DIR

I think the FLINK_CONF_DIR points to the conf directory. This is the place 
where the Flink CLI looks for the flink-conf.yaml file.

I think there is an alternate option for HADOOP_CONF_DIR, YARN_CONF_DIR but I 
am not sure.
Check this 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#common-options
Config option - “fs.hdfs.hadoopconf:“

~ Abhi


From: "Foster, Craig" <foscr...@amazon.com>
Reply-To: "user@flink.apache.org" <user@flink.apache.org>
Date: Thursday, March 23, 2017 at 9:43 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Alternatives to FLINK_CONF_DIR, HADOOP_CONF_DIR, YARN_CONF_DIR

Can I set these in the configuration file? This would be ideal vs. environment 
variables for me but I’m not seeing it in the documentation.

Thanks,
Craig



Alternatives to FLINK_CONF_DIR, HADOOP_CONF_DIR, YARN_CONF_DIR

2017-03-23 Thread Foster, Craig
Can I set these in the configuration file? This would be ideal vs. environment 
variables for me but I’m not seeing it in the documentation.

Thanks,
Craig



Re: Return of Flink shading problems in 1.2.0

2017-03-17 Thread Foster, Craig
Ping. So I’ve built with 3.0.5 and it does give proper shading. So it does get 
me yet another workaround where my only recourse is to use a max version of 
Maven. Still, I feel there should be a long-term fix at some point in time.

I also believe there is a regression in Flink 1.2.0 for Maven 3.3.x with the 
process as documented, so hoping someone can at least duplicate or let me know 
of a new workaround for 3.3.x.

Thanks!
Craig

From: "Foster, Craig" <foscr...@amazon.com>
Reply-To: "user@flink.apache.org" <user@flink.apache.org>
Date: Friday, March 17, 2017 at 7:23 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Cc: Ufuk Celebi <u...@apache.org>, Robert Metzger <rmetz...@apache.org>, 
Stephan Ewen <se...@apache.org>
Subject: Re: Return of Flink shading problems in 1.2.0

Hey Stephen:
I am building twice in every case described in my previous mail. Well, building 
then rebuilding the flink-dist submodule.

This was fixed in BigTop but I started seeing this issue again with Flink 
1.2.0. I was wondering if there's something else in the environment that could 
prevent the shading from working because it isn't now even with the workaround.

On Mar 17, 2017, at 4:08 AM, Stephan Ewen 
<se...@apache.org<mailto:se...@apache.org>> wrote:
Hi Craig!

Maven 3.3.x has a shading problem. You need to build two times, once from root, 
once inside "flink-dist". Have a look here:

https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/building.html#dependency-shading

Maybe that way missed in BigTop?

I am wondering if we should actually throw an error if building with Maven 
3.3.x - too many users run into that issue.

Stephan



On Fri, Mar 17, 2017 at 8:14 AM, Ufuk Celebi 
<u...@apache.org<mailto:u...@apache.org>> wrote:
Pulling in Robert and Stephan who know the project's shading setup the best.

On Fri, Mar 17, 2017 at 6:52 AM, Foster, Craig 
<foscr...@amazon.com<mailto:foscr...@amazon.com>> wrote:
> Hi:
>
> A few months ago, I was building Flink and ran into shading issues for
> flink-dist as described in your docs. We resolved this in BigTop by adding
> the correct way to build flink-dist in the do-component-build script and
> everything was fine after that.
>
>
>
> Now, I’m running into issues doing the same now in Flink 1.2.0 and I’m
> trying to figure out what’s changed and how to fix it. Here’s how the
> flink-dist jar looks with proper shading:
>
>
>
> jar -tvf /usr/lib/flink/lib/flink-dist_2.10-1.1.4.jar | grep
> HttpConnectionParams
> 2485 Tue Jan 01 00:00:00 UTC 1980
> org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
> 3479 Tue Jan 01 00:00:00 UTC 1980
> org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class
>
>
>
> When I build Flink 1.2.0 in BigTop, here’s shading for the jar found in the
> RPM:
>
>
>
> jar -tvf flink-dist_2.10-1.2.0.jar | grep HttpConnectionParams
> 2392 Tue Jan 01 00:00:00 GMT 1980
> org/apache/commons/httpclient/params/HttpConnectionParams.class
> 2485 Tue Jan 01 00:00:00 GMT 1980
> org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
> 3479 Tue Jan 01 00:00:00 GMT 1980
> org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class
> 2868 Tue Jan 01 00:00:00 GMT 1980
> org/apache/http/params/HttpConnectionParams.class
>
>
>
> I thought maybe it was some strange thing going on with BigTop, so then I
> tried just straight building Flink 1.2.0 (outside BigTop) and get the same
> shading:
>
>
>
> jar -tvf flink-dist_2.10-1.2.0.jar | grep HttpConnectionParams
>
>   2485 Fri Mar 17 05:41:16 GMT 2017
> org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
>
>   3479 Fri Mar 17 05:41:16 GMT 2017
> org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class
>
>   2392 Fri Mar 17 05:41:24 GMT 2017
> org/apache/commons/httpclient/params/HttpConnectionParams.class
>
>   2868 Fri Mar 17 05:41:24 GMT 2017
> org/apache/http/params/HttpConnectionParams.class
>
>
>
> And, yes, this is after going into flink-dist and running mvn clean install
> again since I am using Maven 3.3.x.
>
>
>
> Here’s a snippet from my Maven version:
>
> mvn -version
>
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> 2015-11-10T16:41:47+00:00)
>
> Maven home: /usr/local/apache-maven
>
> Java version: 1.8.0_121, vendor: Oracle Corporation
>
>
>
> Any ideas on what my problem might be here?
>
>
>
> Thanks,
>
> Craig
>
>



Re: Return of Flink shading problems in 1.2.0

2017-03-17 Thread Foster, Craig
Hey Stephen:
I am building twice in every case described in my previous mail. Well, building 
then rebuilding the flink-dist submodule.

This was fixed in BigTop but I started seeing this issue again with Flink 
1.2.0. I was wondering if there's something else in the environment that could 
prevent the shading from working because it isn't now even with the workaround.

On Mar 17, 2017, at 4:08 AM, Stephan Ewen 
<se...@apache.org<mailto:se...@apache.org>> wrote:

Hi Craig!

Maven 3.3.x has a shading problem. You need to build two times, once from root, 
once inside "flink-dist". Have a look here:

https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/building.html#dependency-shading

Maybe that way missed in BigTop?

I am wondering if we should actually throw an error if building with Maven 
3.3.x - too many users run into that issue.

Stephan



On Fri, Mar 17, 2017 at 8:14 AM, Ufuk Celebi 
<u...@apache.org<mailto:u...@apache.org>> wrote:
Pulling in Robert and Stephan who know the project's shading setup the best.

On Fri, Mar 17, 2017 at 6:52 AM, Foster, Craig 
<foscr...@amazon.com<mailto:foscr...@amazon.com>> wrote:
> Hi:
>
> A few months ago, I was building Flink and ran into shading issues for
> flink-dist as described in your docs. We resolved this in BigTop by adding
> the correct way to build flink-dist in the do-component-build script and
> everything was fine after that.
>
>
>
> Now, I'm running into issues doing the same now in Flink 1.2.0 and I'm
> trying to figure out what's changed and how to fix it. Here's how the
> flink-dist jar looks with proper shading:
>
>
>
> jar -tvf /usr/lib/flink/lib/flink-dist_2.10-1.1.4.jar | grep
> HttpConnectionParams
> 2485 Tue Jan 01 00:00:00 UTC 1980
> org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
> 3479 Tue Jan 01 00:00:00 UTC 1980
> org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class
>
>
>
> When I build Flink 1.2.0 in BigTop, here's shading for the jar found in the
> RPM:
>
>
>
> jar -tvf flink-dist_2.10-1.2.0.jar | grep HttpConnectionParams
> 2392 Tue Jan 01 00:00:00 GMT 1980
> org/apache/commons/httpclient/params/HttpConnectionParams.class
> 2485 Tue Jan 01 00:00:00 GMT 1980
> org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
> 3479 Tue Jan 01 00:00:00 GMT 1980
> org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class
> 2868 Tue Jan 01 00:00:00 GMT 1980
> org/apache/http/params/HttpConnectionParams.class
>
>
>
> I thought maybe it was some strange thing going on with BigTop, so then I
> tried just straight building Flink 1.2.0 (outside BigTop) and get the same
> shading:
>
>
>
> jar -tvf flink-dist_2.10-1.2.0.jar | grep HttpConnectionParams
>
>   2485 Fri Mar 17 05:41:16 GMT 2017
> org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
>
>   3479 Fri Mar 17 05:41:16 GMT 2017
> org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class
>
>   2392 Fri Mar 17 05:41:24 GMT 2017
> org/apache/commons/httpclient/params/HttpConnectionParams.class
>
>   2868 Fri Mar 17 05:41:24 GMT 2017
> org/apache/http/params/HttpConnectionParams.class
>
>
>
> And, yes, this is after going into flink-dist and running mvn clean install
> again since I am using Maven 3.3.x.
>
>
>
> Here's a snippet from my Maven version:
>
> mvn -version
>
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> 2015-11-10T16:41:47+00:00)
>
> Maven home: /usr/local/apache-maven
>
> Java version: 1.8.0_121, vendor: Oracle Corporation
>
>
>
> Any ideas on what my problem might be here?
>
>
>
> Thanks,
>
> Craig
>
>



Return of Flink shading problems in 1.2.0

2017-03-16 Thread Foster, Craig
Hi:
A few months ago, I was building Flink and ran into shading issues for 
flink-dist as described in your docs. We resolved this in BigTop by adding the 
correct way to build flink-dist in the do-component-build script and everything 
was fine after that.

Now, I’m running into issues doing the same now in Flink 1.2.0 and I’m trying 
to figure out what’s changed and how to fix it. Here’s how the flink-dist jar 
looks with proper shading:

jar -tvf /usr/lib/flink/lib/flink-dist_2.10-1.1.4.jar | grep 
HttpConnectionParams
2485 Tue Jan 01 00:00:00 UTC 1980 
org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
3479 Tue Jan 01 00:00:00 UTC 1980 
org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class

When I build Flink 1.2.0 in BigTop, here’s shading for the jar found in the RPM:

jar -tvf flink-dist_2.10-1.2.0.jar | grep HttpConnectionParams
2392 Tue Jan 01 00:00:00 GMT 1980 
org/apache/commons/httpclient/params/HttpConnectionParams.class
2485 Tue Jan 01 00:00:00 GMT 1980 
org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
3479 Tue Jan 01 00:00:00 GMT 1980 
org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class
2868 Tue Jan 01 00:00:00 GMT 1980 
org/apache/http/params/HttpConnectionParams.class

I thought maybe it was some strange thing going on with BigTop, so then I tried 
just straight building Flink 1.2.0 (outside BigTop) and get the same shading:

jar -tvf flink-dist_2.10-1.2.0.jar | grep HttpConnectionParams
  2485 Fri Mar 17 05:41:16 GMT 2017 
org/apache/flink/hadoop/shaded/org/apache/commons/httpclient/params/HttpConnectionParams.class
  3479 Fri Mar 17 05:41:16 GMT 2017 
org/apache/flink/hadoop/shaded/org/apache/http/params/HttpConnectionParams.class
  2392 Fri Mar 17 05:41:24 GMT 2017 
org/apache/commons/httpclient/params/HttpConnectionParams.class
  2868 Fri Mar 17 05:41:24 GMT 2017 
org/apache/http/params/HttpConnectionParams.class

And, yes, this is after going into flink-dist and running mvn clean install 
again since I am using Maven 3.3.x.

Here’s a snippet from my Maven version:
mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 
2015-11-10T16:41:47+00:00)
Maven home: /usr/local/apache-maven
Java version: 1.8.0_121, vendor: Oracle Corporation

Any ideas on what my problem might be here?

Thanks,
Craig



Re: Zeppelin: Flink Kafka Connector

2017-01-17 Thread Foster, Craig
Are connectors being included in the 1.2.0 release or do you mean Kafka 
specifically?

From: Fabian Hueske 
Reply-To: "user@flink.apache.org" 
Date: Tuesday, January 17, 2017 at 7:10 AM
To: "user@flink.apache.org" 
Subject: Re: Zeppelin: Flink Kafka Connector

One thing to add: Flink 1.2.0 has not been release yet.
The FlinkKafkaConsumer010 is only available in a SNAPSHOT release or the first 
release candidate (RC0).
Best, Fabian

2017-01-17 16:08 GMT+01:00 Timo Walther 
>:
You are using an old version of Flink (0.10.2). A FlinkKafkaConsumer010 was not 
present at that time. You need to upgrade to Flink 1.2.

Timo


Am 17/01/17 um 15:58 schrieb Neil Derraugh:

This is really a Zeppelin question, and I’ve already posted to the user list 
there.  I’m just trying to draw in as many relevant eyeballs as possible.  If 
you can help please reply on the Zeppelin mailing list.

In my Zeppelin notebook I’m having a problem importing the Kafka streaming 
library for Flink.

I added org.apache.flink:flink-connector-kafka_2.11:0.10.2 to the Dependencies 
on the Flink interpreter.

The Flink interpreter runs code, just not if I have the following import.
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010

I get this error:
:72: error: object FlinkKafkaConsumer010 is not a member of package 
org.apache.flink.streaming.connectors.kafka
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010

Am I doing something wrong here?

Neil




Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Foster, Craig
I would suggest using EMRFS anyway, which is the way to access the S3 file 
system from EMR (using the same s3:// prefixes).  That said, you will run into 
the same shading issues in our build until the next release—which is coming up 
relatively shortly.



From: Robert Metzger 
Reply-To: "user@flink.apache.org" 
Date: Wednesday, November 23, 2016 at 8:24 AM
To: "user@flink.apache.org" 
Subject: Re: S3 checkpointing in AWS in Frankfurt

Hi Jonathan,

have you tried using Amazon's latest EMR Hadoop distribution? Maybe they've 
fixed the issue in their for older Hadoop releases?

On Wed, Nov 23, 2016 at 4:38 PM, Scott Kidder 
> wrote:
Hi Jonathan,

You might be better off creating a small Hadoop HDFS cluster just for the 
purpose of storing Flink checkpoint & savepoint data. Like you, I tried using 
S3 to persist Flink state, but encountered AWS SDK issues and felt like I was 
going down an ill-advised path. I then created a small 3-node HDFS cluster in 
the same region as my Flink hosts but distributed across 3 AZs. The 
checkpointing is very fast and, most importantly, just works.

Is there a firm requirement to use S3, or could you use HDFS instead?

Best,

--Scott Kidder

On Tue, Nov 22, 2016 at 11:52 PM, Jonathan Share 
> wrote:
Hi,

I'm interested in hearing if anyone else has experience with using Amazon S3 as 
a state backend in the Frankfurt region. For political reasons we've been asked 
to keep all European data in Amazon's Frankfurt region. This causes a problem 
as the S3 endpoint in Frankfurt requires the use of AWS Signature Version 4 
"This new Region supports only Signature Version 4" [1] and this doesn't appear 
to work with the Hadoop version that Flink is built against [2].

After some hacking we have managed to create a docker image with a build of 
Flink 1.2 master, copying over jar files from the hadoop 3.0.0-alpha1 package 
and this appears to work, for the most part but we still suffer from some 
classpath problems (conflicts between AWS API used in hadoop and those we want 
to use in out streams for interacting with Kinesis) and the whole thing feels a 
little fragile. Has anyone else tried this? Is there a simpler solution?

As a follow-up question, we saw that with checkpointing on three relatively 
simple streams set to 1 second, our S3 costs were higher than the EC2 costs for 
our entire infrastructure. This seems slightly disproportionate. For now we 
have reduced checkpointing interval to 10 seconds and that has greatly improved 
the cost projections graphed via Amazon Cloud Watch, but I'm interested in 
hearing other peoples experience with this. Is that the kind of billing level 
we can expect or is this a symptom of a mis-configuration? Is this a setup 
others are using? As we are using Kinesis as the source for all streams I don't 
see a huge risk with larger checkpoint intervals and our Sinks are designed to 
mostly tolerate duplicates (some improvements can be made).

Thanks in advance
Jonathan


[1] https://aws.amazon.com/blogs/aws/aws-region-germany/
[2] https://issues.apache.org/jira/browse/HADOOP-13324




Re: flink-dist shading

2016-11-22 Thread Foster, Craig
Thanks Max. Yep, I just confirmed it works. 

On 11/22/16, 2:09 AM, "Maximilian Michels" <m...@apache.org> wrote:

Hi Craig,

I've left a comment on the original Maven JIRA issue to revive the
discussion. For BigTop, you can handle this in the build script by
building flink-dist again after a successful build. That will always
work independently of the Maven 3.x version.

-Max


On Mon, Nov 21, 2016 at 6:27 PM, Foster, Craig <foscr...@amazon.com> wrote:
> Thanks for explaining, Robert and Gordon. For however it helps, I’ll 
comment
> on the original Maven issue which seems to be affecting many people. I am
> trying to create RPMs using BigTop and am still not quite sure how to 
handle
> this case. I guess what I could do is build the parent first in our script
> and then build flink-dist again as described in the instructions. I’ll try
> that out and see if it resolves the issue.
>
>
>
> From: Robert Metzger <rmetz...@apache.org>
> Reply-To: "user@flink.apache.org" <user@flink.apache.org>
> Date: Saturday, November 19, 2016 at 4:08 AM
> To: "user@flink.apache.org" <user@flink.apache.org>
> Subject: Re: flink-dist shading
>
>
>
> Hi Craig,
>
> I also received only this email (and I'm a moderator of the dev@ list, so
> the message never made it into Apache's infra)
>
>
>
> When this issue was first reported [1][2] I asked on the Maven mailing 
list
> what's going on [3]. I think this JIRA contains the most information on 
the
> issue: https://issues.apache.org/jira/browse/MNG-5899
>
> It doesn't seem that Maven is going to fix the issue anytime soon.
>
>
>
> One idea I had regarding this issue was to print a warning in the maven
> output if we detect Maven 3.3 at build time.
>
>
>
>
>
> [1]
> 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/flink-dist-packaging-including-unshaded-classes-td9504.html
>
> [2] https://issues.apache.org/jira/browse/FLINK-3158
>
> [3] http://comments.gmane.org/gmane.comp.jakarta.turbine.maven.user/137621
>
>
>
>
>
>
>
> On Fri, Nov 18, 2016 at 7:23 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> wrote:
>
> Hi Craig,
>
>
>
> I think the email wasn't sent to the ‘dev’ list, somehow.
>
>
>
> Have you tried this:
>
>
>
> mvn clean install -DskipTests
>
> # In Maven 3.3 the shading of flink-dist doesn't work properly in one run,
> so we need to run mvn for flink-dist again.
>
> cd flink-dist
>
> mvn clean install -DskipTests
>
> I agree that it’ll affect downstream users who need to build Flink
> themselves, and would be best if it can be resolved.
>
> The above is still more or less a “workaround”, but since I don’t really
> know the reason for why the newer Maven versions
>
> won’t properly shade, we’ll probably need to wait for others more
> knowledgable on the build infrastructure to chime in and
>
> see if there’s a good long-term solution.
>
>
>
> Best Regards,
>
> Gordon
>
> On November 19, 2016 at 8:48:32 AM, Foster, Craig (foscr...@amazon.com)
> wrote:
>
> I’m not even sure this was delivered to the ‘dev’ list but I’ll go ahead 
and
> forward the same email to the ‘user’ list since I haven’t seen a response.
>
> ---
>
>
>
> I’m following up on the issue in FLINK-5013 about flink-dist specifically
> requiring Maven 3.0.5 through to <3.3. This affects people who build Flink
> with BigTop (not only EMR), so I’m wondering about the context and how we
> can properly shade the Apache HTTP libraries so that flink-dist can be 
built
> with a current version of Maven. Any insight into this would be helpful.
>
>
>
> Thanks!
>
> Craig
>
>
>
>




Re: flink-dist shading

2016-11-21 Thread Foster, Craig
Thanks for explaining, Robert and Gordon. For however it helps, I’ll comment on 
the original Maven issue which seems to be affecting many people. I am trying 
to create RPMs using BigTop and am still not quite sure how to handle this 
case. I guess what I could do is build the parent first in our script and then 
build flink-dist again as described in the instructions. I’ll try that out and 
see if it resolves the issue.

From: Robert Metzger <rmetz...@apache.org>
Reply-To: "user@flink.apache.org" <user@flink.apache.org>
Date: Saturday, November 19, 2016 at 4:08 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: flink-dist shading

Hi Craig,
I also received only this email (and I'm a moderator of the dev@ list, so the 
message never made it into Apache's infra)

When this issue was first reported [1][2] I asked on the Maven mailing list 
what's going on [3]. I think this JIRA contains the most information on the 
issue: https://issues.apache.org/jira/browse/MNG-5899
It doesn't seem that Maven is going to fix the issue anytime soon.

One idea I had regarding this issue was to print a warning in the maven output 
if we detect Maven 3.3 at build time.


[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/flink-dist-packaging-including-unshaded-classes-td9504.html
[2] https://issues.apache.org/jira/browse/FLINK-3158
[3] http://comments.gmane.org/gmane.comp.jakarta.turbine.maven.user/137621



On Fri, Nov 18, 2016 at 7:23 PM, Tzu-Li (Gordon) Tai 
<tzuli...@apache.org<mailto:tzuli...@apache.org>> wrote:
Hi Craig,

I think the email wasn't sent to the ‘dev’ list, somehow.

Have you tried this:


mvn clean install -DskipTests

# In Maven 3.3 the shading of flink-dist doesn't work properly in one run, so 
we need to run mvn for flink-dist again.

cd flink-dist

mvn clean install -DskipTests
I agree that it’ll affect downstream users who need to build Flink themselves, 
and would be best if it can be resolved.
The above is still more or less a “workaround”, but since I don’t really know 
the reason for why the newer Maven versions
won’t properly shade, we’ll probably need to wait for others more knowledgable 
on the build infrastructure to chime in and
see if there’s a good long-term solution.

Best Regards,
Gordon

On November 19, 2016 at 8:48:32 AM, Foster, Craig 
(foscr...@amazon.com<mailto:foscr...@amazon.com>) wrote:
I’m not even sure this was delivered to the ‘dev’ list but I’ll go ahead and 
forward the same email to the ‘user’ list since I haven’t seen a response.
---

I’m following up on the issue in FLINK-5013 about flink-dist specifically 
requiring Maven 3.0.5 through to <3.3. This affects people who build Flink with 
BigTop (not only EMR), so I’m wondering about the context and how we can 
properly shade the Apache HTTP libraries so that flink-dist can be built with a 
current version of Maven. Any insight into this would be helpful.

Thanks!
Craig




Re: flink-dist shading

2016-11-18 Thread Foster, Craig
I’m not even sure this was delivered to the ‘dev’ list but I’ll go ahead and 
forward the same email to the ‘user’ list since I haven’t seen a response.
---

I’m following up on the issue in FLINK-5013 about flink-dist specifically 
requiring Maven 3.0.5 through to <3.3. This affects people who build Flink with 
BigTop (not only EMR), so I’m wondering about the context and how we can 
properly shade the Apache HTTP libraries so that flink-dist can be built with a 
current version of Maven. Any insight into this would be helpful.

Thanks!
Craig



Re: Wikiedit QuickStart with Kinesis

2016-09-01 Thread Foster, Craig
Oh, in that case, maybe I should look into using the KCL. I'm just using boto 
and boto3 which are definitely having different problems but both related to 
the encoding.
 
boto3 prints *something*:
 
(.96.129.59,-20)'(01:541:4305:C70:10B4:FA8C:3CF9:B9B0,0(Patrick 
Barlane,0(Nedrutland,12(GreenC bot,15(Bamyers99,-170(Mean as 
custard,661)ж?U¨p?&"1w?
??
(RaphaelQS,-3(.44.211.32,50(JMichael22,1298)
   (Wbm1058,-9)Y?Z/r(???
 
But boto just gives an exception:
 
: 'utf8' codec can't decode bytes in 
position 0-2: invalid continuation byte
 
It does this even when getting a response object.
 
Thanks for your help! I'll try with the KCL before changing my 
SerializationSchema just yet.


On 9/1/16, 10:37 AM, "Tzu-Li Tai"  wrote:

I’m afraid the “AUTO” option on the Kinesis producer is actually bugged, so
the internally used KPL library correctly pick up credentials with the
default credential provider chain. I’ve just filed a JIRA for this: 
https://issues.apache.org/jira/browse/FLINK-4559
  .

Regarding the data encoding:
So, you’re using a SimpleStringSchema for serialization of the tuple
strings. The SimpleStringSchema simply calls String.getBytes() to serialize
the string into a byte array, so the data is encoded with the platform’s
default charset.
Internally, the SimpleStringSchema is wrapped within a
KinesisSerializationSchema, which ultimately wraps the bytes within
a ByteBuffer that is added to the internal KPL for writing to the streams.
You can actually also choose to directly use a KinesisSerializationSchema to
invoke the FlinkKinesisProducer.

Also note that since FlinkKinesisProducer uses KPL, there might be some
"magic bytes" in the encoded data added by KPL to aggregate multiple records
when writing to streams. If you’re using the AWS KCL (Kinesis Client
Library) or the FlinkKinesisConsumer to read the data, there shouldn’t be a
problem decoding them (the FlinkKinesisConsumer uses a class from KCL to
help decode aggregated records sent by KPL).

Let me know if you bump into any other problems ;)

Regards,
Gordon



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wikiedit-QuickStart-with-Kinesis-tp8819p8840.html
Sent from the Apache Flink User Mailing List archive. mailing list archive 
at Nabble.com.




Re: Wikiedit QuickStart with Kinesis

2016-09-01 Thread Foster, Craig
Thanks Gordon. I think I changed all my versions to match the version to which 
I built Kinesis connector, so you were right. That seems to have moved me 
further. I can write to streams now. Now all I need to do is figure out how 
Kinesis is encoding it. :)

One issue with the "AUTO" option is that whatever credentials it finds, it 
doesn't seem to have PutRecords permissions even though the AWS IAM role I am 
using ostensibly has that...so I am back to having credentials in code which 
isn't necessarily a best practice. I haven't figured that part out yet either.

From: "Tzu-Li (Gordon) Tai" <tzuli...@apache.org>
Reply-To: "user@flink.apache.org" <user@flink.apache.org>
Date: Thursday, September 1, 2016 at 2:25 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: Wikiedit QuickStart with Kinesis

Hi Craig,

I’ve just run a simple test on this and there should be no problem.

What Flink version were you using (the archetype version used with the Flink 
Quickstart Maven Archetype)?
Also, on which branch / commit was the Kinesis connector built? Seeing that 
you’ve used the “AUTO”
credentials provider option, I’m assuming it’s built on the master branch and 
not a release branch (the “AUTO”
option wasn’t included in any of the release branches yet).

So I’m suspecting it’s due to a version conflict between the two. If yes, you 
should build the Kinesis connector
with the same release version branch as the Flink version you’re using.
Could you check and see if the problem remains? Thanks!

Regards,
Gordon



On September 1, 2016 at 1:34:19 AM, Foster, Craig 
(foscr...@amazon.com<mailto:foscr...@amazon.com>) wrote:
Hi:
I am using the following WikiEdit example:
https://ci.apache.org/projects/flink/flink-docs-master/quickstart/run_example_quickstart.html

It works when printing the contents to a file or stdout.

But I wanted to modify it to use Kinesis instead of Kafka. So instead of the 
Kafka part, I put:

Properties producerConfig = new Properties();
producerConfig.put(ProducerConfigConstants.AWS_REGION, "us-east-1");
producerConfig.put(ProducerConfigConstants.AWS_CREDENTIALS_PROVIDER, "AUTO");

FlinkKinesisProducer kinesis = new FlinkKinesisProducer<>(new 
SimpleStringSchema(), producerConfig);
kinesis.setFailOnError(true);
kinesis.setDefaultStream("my-flink-stream");
kinesis.setDefaultPartition("0");

result
.map(new MapFunction<Tuple2<String,Long>, String>() {
@Override
public String map(Tuple2<String, Long> tuple) {
return tuple.toString();
}
})
.addSink(kinesis);

see.execute();


But I get the following error:

2016-08-31 17:05:41,541 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom 
Source (1/1) (2f7d339588fec18e0f2617439ee9be6d) switched from RUNNING to 
CANCELING

2016-08-31 17:05:41,542 INFO  org.apache.flink.yarn.YarnJobManager  
- Status of job 43a13707d92da260827f37968597c187 () changed to 
FAILING.

java.lang.Exception: Serialized representation of 
org.apache.flink.streaming.runtime.tasks.TimerException: 
java.lang.RuntimeException: Could not forward element to next operator

at 
org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:803)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)
Searching Google doesn't yield many things that seem to work. Is there 
somewhere I should look for a root cause? I looked in the full log file but 
it's not much more than this stacktrace.


Wikiedit QuickStart with Kinesis

2016-08-31 Thread Foster, Craig
Hi:
I am using the following WikiEdit example:
https://ci.apache.org/projects/flink/flink-docs-master/quickstart/run_example_quickstart.html

It works when printing the contents to a file or stdout.

But I wanted to modify it to use Kinesis instead of Kafka. So instead of the 
Kafka part, I put:

Properties producerConfig = new Properties();
producerConfig.put(ProducerConfigConstants.AWS_REGION, "us-east-1");
producerConfig.put(ProducerConfigConstants.AWS_CREDENTIALS_PROVIDER, "AUTO");

FlinkKinesisProducer kinesis = new FlinkKinesisProducer<>(new 
SimpleStringSchema(), producerConfig);
kinesis.setFailOnError(true);
kinesis.setDefaultStream("my-flink-stream");
kinesis.setDefaultPartition("0");

result
.map(new MapFunction, String>() {
@Override
public String map(Tuple2 tuple) {
return tuple.toString();
}
})
.addSink(kinesis);

see.execute();


But I get the following error:

2016-08-31 17:05:41,541 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom 
Source (1/1) (2f7d339588fec18e0f2617439ee9be6d) switched from RUNNING to 
CANCELING

2016-08-31 17:05:41,542 INFO  org.apache.flink.yarn.YarnJobManager  
- Status of job 43a13707d92da260827f37968597c187 () changed to 
FAILING.

java.lang.Exception: Serialized representation of 
org.apache.flink.streaming.runtime.tasks.TimerException: 
java.lang.RuntimeException: Could not forward element to next operator

at 
org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:803)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)
Searching Google doesn't yield many things that seem to work. Is there 
somewhere I should look for a root cause? I looked in the full log file but 
it's not much more than this stacktrace.


Re: Setting number of TaskManagers

2016-08-24 Thread Foster, Craig
Oh, sorry, I didn't specify I was using YARN and don't necessarily want to 
specify with the command line option.

From: Greg Hogan <c...@greghogan.com>
Reply-To: "user@flink.apache.org" <user@flink.apache.org>
Date: Wednesday, August 24, 2016 at 12:07 PM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: Setting number of TaskManagers

The number of TaskManagers will be equal to the number of entries in the 
conf/slaves file.

On Wed, Aug 24, 2016 at 3:04 PM, Foster, Craig 
<foscr...@amazon.com<mailto:foscr...@amazon.com>> wrote:
Is there a way to set the number of TaskManagers using a configuration file or 
environment variable? I'm looking at the docs for it and it says you can set 
slots but not the number of TMs.

Thanks,
Craig



Setting number of TaskManagers

2016-08-24 Thread Foster, Craig
Is there a way to set the number of TaskManagers using a configuration file or 
environment variable? I'm looking at the docs for it and it says you can set 
slots but not the number of TMs.

Thanks,
Craig


WordCount w/ YARN and EMR local filesystem and/or HDFS

2016-08-23 Thread Foster, Craig
I'm trying to use the wordcount example with the local file system, but it's 
giving me permissions error or it's not finding it. It works just fine for 
input and output on S3. What is the correct URI usage for the local file system 
and HDFS?

I have installed Flink on EMR and am just using the flink run script to start 
the job:

% flink run -m yarn-cluster -yn 2 
/usr/lib/flink/examples/streaming/WordCount.jar --input 
file:///home/hadoop/LICENSE.txt



The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Failed to submit job 8a9efe4f99c5cad5897c18146fb66309 
(Streaming WordCount)
at org.apache.flink.client.program.Client.runBlocking(Client.java:381)
at org.apache.flink.client.program.Client.runBlocking(Client.java:355)
at 
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:65)
at 
org.apache.flink.streaming.examples.wordcount.WordCount.main(WordCount.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:505)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
at org.apache.flink.client.program.Client.runBlocking(Client.java:248)
at 
org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:866)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1192)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1243)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Failed to 
submit job 8a9efe4f99c5cad5897c18146fb66309 (Streaming WordCount)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1120)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:381)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:153)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
at 
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:107)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.JobException: Creating the input splits 
caused an error: File file:/home/hadoop/LICENSE.txt does not exist or the user 
running Flink ('yarn') has insufficient permissions to access it.
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:172)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:679)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1026)
... 25 more
Caused by: java.io.FileNotFoundException: File file:/home/hadoop/LICENSE.txt 
does not exist or the user