Re: Processing S3 data with Apache Flink

2015-11-21 Thread Konstantin Knauf
Hi Robert,

thanks a lot, it's working now. Actually, it also says "directory" in
the description. So I should have known :/

On additional question though. If I use the flink binary for Hadoop
1.2.1 and run flink in standalone mode, should I use the *-hadoop1
dependencies even If I am not interacting with HDFS 1.x?

Cheers,

Konstantin

On 21.11.2015 14:52, Robert Metzger wrote:
> Hi,
> 
> It seems that you've set the "fs.hdfs.hadoopconf" configuration
> parameter to a file. I think you have to set it the directory containing
> the configuration.
> Sorry, I know that's not very intuitive, but in Hadoop the settings for
> in different files (hdfs|yarn|core)-site.xml.
> 
> 
> On Sat, Nov 21, 2015 at 12:48 PM, Konstantin Knauf
> > wrote:
> 
> Hi Ufuk,
> 
> sorry for not getting back to you for so long, and thanks for your
> answer. The problem persists unfortunately. Running the job from the IDE
> works (with core-site.xml on classpath), running it in local standalone
> mode does not. AccessKeyID and SecretAccesKey are not found.
> 
> Attached the jobmanager log on DEBUG level. The core-site.xml is
> definitely at the configured location.
> 
> I am now on version 0.10.0 and using the binaries for Hadoop 1.2.1 to
> run the jar in local mode. Do I have to use the Hadoop 2.x version for
> this to work? I have put hadoop-common-2.3.jar into the flink lib
> folder.
> 
> I don't know if it is relevant (but it seems to be related), when I run
> the job from my IDE I get the warning:
> 
> 2015-11-21 12:43:11 WARN  NativeCodeLoader:62 - Unable to load
> native-hadoop library for your platform... using builtin-java classes
> where applicable
> 
> Cheers and thank you,
> 
> Konstantin
> 
> 
> On 14.10.2015 11:44, Ufuk Celebi wrote:
> >
> >> On 10 Oct 2015, at 22:59, snntr  > wrote:
> >>
> >> Hey everyone,
> >>
> >> I was having the same problem with S3 and found this thread very
> useful.
> >> Everything works fine now, when I start Flink from my IDE, but
> when I run
> >> the jar in local mode I keep getting
> >>
> >> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
> Access Key
> >> must be specified as the username or password (respectively) of a
> s3n URL,
> >> or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
> >> properties (respectively).
> >>
> >> I have set fs.hdfs.hadoopconf to point to a core-site.xml on my local
> >> machine with the required properties. What am I missing?
> >>
> >> Any advice is highly appreciated ;)
> >
> > This looks like a problem with picking up the Hadoop config. Can
> you look into the logs to check whether the configuration is picked
> up? Change the log settings to DEBUG in log/log4j.properties for
> this. And can you provide the complete stack trace?
> >
> > – Ufuk
> >
> >
> 
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com
>  * +49-174-3413182
> 
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
> 
> 

-- 
Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082


Re: Processing S3 data with Apache Flink

2015-11-21 Thread Robert Metzger
Hi,

great to hear that its working. I've updated the documentation (for 1.0)
and made the word directory bold ;)

You should try to match your Hadoop version as closely as possible.
Are you not using HDFS at all? Then it doesn't matter which version of
Flink you are downloading.
When using Hadoop 2.x then I'd recommend at least a Flink version for
Hadoop 2.3.0


On Sat, Nov 21, 2015 at 3:13 PM, Konstantin Knauf <
konstantin.kn...@tngtech.com> wrote:

> Hi Robert,
>
> thanks a lot, it's working now. Actually, it also says "directory" in
> the description. So I should have known :/
>
> On additional question though. If I use the flink binary for Hadoop
> 1.2.1 and run flink in standalone mode, should I use the *-hadoop1
> dependencies even If I am not interacting with HDFS 1.x?
>
> Cheers,
>
> Konstantin
>
> On 21.11.2015 14:52, Robert Metzger wrote:
> > Hi,
> >
> > It seems that you've set the "fs.hdfs.hadoopconf" configuration
> > parameter to a file. I think you have to set it the directory containing
> > the configuration.
> > Sorry, I know that's not very intuitive, but in Hadoop the settings for
> > in different files (hdfs|yarn|core)-site.xml.
> >
> >
> > On Sat, Nov 21, 2015 at 12:48 PM, Konstantin Knauf
> > >
> wrote:
> >
> > Hi Ufuk,
> >
> > sorry for not getting back to you for so long, and thanks for your
> > answer. The problem persists unfortunately. Running the job from the
> IDE
> > works (with core-site.xml on classpath), running it in local
> standalone
> > mode does not. AccessKeyID and SecretAccesKey are not found.
> >
> > Attached the jobmanager log on DEBUG level. The core-site.xml is
> > definitely at the configured location.
> >
> > I am now on version 0.10.0 and using the binaries for Hadoop 1.2.1 to
> > run the jar in local mode. Do I have to use the Hadoop 2.x version
> for
> > this to work? I have put hadoop-common-2.3.jar into the flink lib
> > folder.
> >
> > I don't know if it is relevant (but it seems to be related), when I
> run
> > the job from my IDE I get the warning:
> >
> > 2015-11-21 12:43:11 WARN  NativeCodeLoader:62 - Unable to load
> > native-hadoop library for your platform... using builtin-java classes
> > where applicable
> >
> > Cheers and thank you,
> >
> > Konstantin
> >
> >
> > On 14.10.2015 11:44, Ufuk Celebi wrote:
> > >
> > >> On 10 Oct 2015, at 22:59, snntr  > > wrote:
> > >>
> > >> Hey everyone,
> > >>
> > >> I was having the same problem with S3 and found this thread very
> > useful.
> > >> Everything works fine now, when I start Flink from my IDE, but
> > when I run
> > >> the jar in local mode I keep getting
> > >>
> > >> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
> > Access Key
> > >> must be specified as the username or password (respectively) of a
> > s3n URL,
> > >> or by setting the fs.s3n.awsAccessKeyId or
> fs.s3n.awsSecretAccessKey
> > >> properties (respectively).
> > >>
> > >> I have set fs.hdfs.hadoopconf to point to a core-site.xml on my
> local
> > >> machine with the required properties. What am I missing?
> > >>
> > >> Any advice is highly appreciated ;)
> > >
> > > This looks like a problem with picking up the Hadoop config. Can
> > you look into the logs to check whether the configuration is picked
> > up? Change the log settings to DEBUG in log/log4j.properties for
> > this. And can you provide the complete stack trace?
> > >
> > > – Ufuk
> > >
> > >
> >
> > --
> > Konstantin Knauf * konstantin.kn...@tngtech.com
> >  * +49-174-3413182
> > 
> > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> > Sitz: Unterföhring * Amtsgericht München * HRB 135082
> >
> >
>
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>


Re: Processing S3 data with Apache Flink

2015-11-21 Thread Robert Metzger
Hi,

It seems that you've set the "fs.hdfs.hadoopconf" configuration parameter
to a file. I think you have to set it the directory containing the
configuration.
Sorry, I know that's not very intuitive, but in Hadoop the settings for in
different files (hdfs|yarn|core)-site.xml.


On Sat, Nov 21, 2015 at 12:48 PM, Konstantin Knauf <
konstantin.kn...@tngtech.com> wrote:

> Hi Ufuk,
>
> sorry for not getting back to you for so long, and thanks for your
> answer. The problem persists unfortunately. Running the job from the IDE
> works (with core-site.xml on classpath), running it in local standalone
> mode does not. AccessKeyID and SecretAccesKey are not found.
>
> Attached the jobmanager log on DEBUG level. The core-site.xml is
> definitely at the configured location.
>
> I am now on version 0.10.0 and using the binaries for Hadoop 1.2.1 to
> run the jar in local mode. Do I have to use the Hadoop 2.x version for
> this to work? I have put hadoop-common-2.3.jar into the flink lib folder.
>
> I don't know if it is relevant (but it seems to be related), when I run
> the job from my IDE I get the warning:
>
> 2015-11-21 12:43:11 WARN  NativeCodeLoader:62 - Unable to load
> native-hadoop library for your platform... using builtin-java classes
> where applicable
>
> Cheers and thank you,
>
> Konstantin
>
>
> On 14.10.2015 11:44, Ufuk Celebi wrote:
> >
> >> On 10 Oct 2015, at 22:59, snntr  wrote:
> >>
> >> Hey everyone,
> >>
> >> I was having the same problem with S3 and found this thread very useful.
> >> Everything works fine now, when I start Flink from my IDE, but when I
> run
> >> the jar in local mode I keep getting
> >>
> >> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
> Key
> >> must be specified as the username or password (respectively) of a s3n
> URL,
> >> or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
> >> properties (respectively).
> >>
> >> I have set fs.hdfs.hadoopconf to point to a core-site.xml on my local
> >> machine with the required properties. What am I missing?
> >>
> >> Any advice is highly appreciated ;)
> >
> > This looks like a problem with picking up the Hadoop config. Can you
> look into the logs to check whether the configuration is picked up? Change
> the log settings to DEBUG in log/log4j.properties for this. And can you
> provide the complete stack trace?
> >
> > – Ufuk
> >
> >
>
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>


Re: Processing S3 data with Apache Flink

2015-11-21 Thread Konstantin Knauf
Hi Ufuk,

sorry for not getting back to you for so long, and thanks for your
answer. The problem persists unfortunately. Running the job from the IDE
works (with core-site.xml on classpath), running it in local standalone
mode does not. AccessKeyID and SecretAccesKey are not found.

Attached the jobmanager log on DEBUG level. The core-site.xml is
definitely at the configured location.

I am now on version 0.10.0 and using the binaries for Hadoop 1.2.1 to
run the jar in local mode. Do I have to use the Hadoop 2.x version for
this to work? I have put hadoop-common-2.3.jar into the flink lib folder.

I don't know if it is relevant (but it seems to be related), when I run
the job from my IDE I get the warning:

2015-11-21 12:43:11 WARN  NativeCodeLoader:62 - Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable

Cheers and thank you,

Konstantin


On 14.10.2015 11:44, Ufuk Celebi wrote:
> 
>> On 10 Oct 2015, at 22:59, snntr  wrote:
>>
>> Hey everyone, 
>>
>> I was having the same problem with S3 and found this thread very useful.
>> Everything works fine now, when I start Flink from my IDE, but when I run
>> the jar in local mode I keep getting 
>>
>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
>> must be specified as the username or password (respectively) of a s3n URL,
>> or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
>> properties (respectively).
>>
>> I have set fs.hdfs.hadoopconf to point to a core-site.xml on my local
>> machine with the required properties. What am I missing?
>>
>> Any advice is highly appreciated ;)
> 
> This looks like a problem with picking up the Hadoop config. Can you look 
> into the logs to check whether the configuration is picked up? Change the log 
> settings to DEBUG in log/log4j.properties for this. And can you provide the 
> complete stack trace?
> 
> – Ufuk
> 
> 

-- 
Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082
12:29:36,538 DEBUG org.apache.hadoop.security.Groups -  Creating new Groups object
12:29:36,559 DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30
12:29:36,590 DEBUG org.apache.hadoop.security.UserGroupInformation   - hadoop login
12:29:36,591 DEBUG org.apache.hadoop.security.UserGroupInformation   - hadoop login commit
12:29:36,593 DEBUG org.apache.hadoop.security.UserGroupInformation   - using local user:UnixPrincipal: kknauf
12:29:36,594 DEBUG org.apache.hadoop.security.UserGroupInformation   - UGI loginUser:kknauf
12:29:36,594 INFO  org.apache.flink.runtime.jobmanager.JobManager- 
12:29:36,594 INFO  org.apache.flink.runtime.jobmanager.JobManager-  Starting JobManager (Version: 0.10.0, Rev:ab2cca4, Date:10.11.2015 @ 13:50:14 UTC)
12:29:36,595 INFO  org.apache.flink.runtime.jobmanager.JobManager-  Current user: kknauf
12:29:36,595 INFO  org.apache.flink.runtime.jobmanager.JobManager-  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.66-b17
12:29:36,595 INFO  org.apache.flink.runtime.jobmanager.JobManager-  Maximum heap size: 736 MiBytes
12:29:36,595 INFO  org.apache.flink.runtime.jobmanager.JobManager-  JAVA_HOME: /usr/lib/jvm/java-8-oracle/
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager-  Hadoop version: 1.2.1
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager-  JVM Options:
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager- -Xms768m
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager- -Xmx768m
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager- -Dlog.file=/opt/flink-0.10.0/log/flink-kknauf-jobmanager-0-kknauf-ThinkPad-T440p.log
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager- -Dlog4j.configuration=file:/opt/flink-0.10.0/conf/log4j.properties
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager- -Dlogback.configurationFile=file:/opt/flink-0.10.0/conf/logback.xml
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager-  Program Arguments:
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager- --configDir
12:29:36,599 INFO  org.apache.flink.runtime.jobmanager.JobManager- /opt/flink-0.10.0/conf
12:29:36,599 INFO  

Re: Processing S3 data with Apache Flink

2015-10-20 Thread Stephan Ewen
@Konstantin (2) : Can you try the workaround described by Robert, with the
"s3n" file system scheme?

We are removing the custom S3 connector now, simply reusing Hadoop's S3
connector for all cases.

@Kostia:
You are right, there should be no broken stuff that is not clearly marked
as "beta". For the S3 connector, that was a problem in the testing on our
side and should not have happened.
In general, you can assume that stuff in "flink-contrib" is in beta status,
as well as the stuff in "flink-staging" (although much of the staging stuff
will graduate with the next release). All code not in these projects should
be well functioning. We test a lot, so there should be not many broken
cases like that S3 connector.

Greetings,
Stephan


On Wed, Oct 14, 2015 at 11:44 AM, Ufuk Celebi  wrote:

>
> > On 10 Oct 2015, at 22:59, snntr  wrote:
> >
> > Hey everyone,
> >
> > I was having the same problem with S3 and found this thread very useful.
> > Everything works fine now, when I start Flink from my IDE, but when I run
> > the jar in local mode I keep getting
> >
> > java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
> Key
> > must be specified as the username or password (respectively) of a s3n
> URL,
> > or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
> > properties (respectively).
> >
> > I have set fs.hdfs.hadoopconf to point to a core-site.xml on my local
> > machine with the required properties. What am I missing?
> >
> > Any advice is highly appreciated ;)
>
> This looks like a problem with picking up the Hadoop config. Can you look
> into the logs to check whether the configuration is picked up? Change the
> log settings to DEBUG in log/log4j.properties for this. And can you provide
> the complete stack trace?
>
> – Ufuk
>
>


Re: Processing S3 data with Apache Flink

2015-10-14 Thread Ufuk Celebi

> On 10 Oct 2015, at 22:59, snntr  wrote:
> 
> Hey everyone, 
> 
> I was having the same problem with S3 and found this thread very useful.
> Everything works fine now, when I start Flink from my IDE, but when I run
> the jar in local mode I keep getting 
> 
> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
> must be specified as the username or password (respectively) of a s3n URL,
> or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
> properties (respectively).
> 
> I have set fs.hdfs.hadoopconf to point to a core-site.xml on my local
> machine with the required properties. What am I missing?
> 
> Any advice is highly appreciated ;)

This looks like a problem with picking up the Hadoop config. Can you look into 
the logs to check whether the configuration is picked up? Change the log 
settings to DEBUG in log/log4j.properties for this. And can you provide the 
complete stack trace?

– Ufuk



Re: Processing S3 data with Apache Flink

2015-10-06 Thread KOSTIANTYN Kudriavtsev
Hi Robert,

thank you very much for your input!

Have you tried that?
With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward, and
now got a new exception:


Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
for '/***.csv' - ResponseCode=403, ResponseMessage=Forbidden

it's really strange as far as I gave full permissions
to authenticated users and can get target file from s3cmd or s3 browser
from the same PC... I realize that it's question not to you, but perhaps
you have faced the same issue

Thanks in advance!
Kostia

Thank you,
Konstantin Kudryavtsev

On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger  wrote:

> Hi Kostia,
>
> thank you for writing to the Flink mailing list. I actually started to try
> out our S3 File system support after I saw your question on StackOverflow
> [1].
> I found that our S3 connector is very broken. I had to resolve two more
> issues with it, before I was able to get the same exception you reported.
>
> Another Flink commiter looked into the issue as well (it was confirmed as
> well) but there was no solution [2].
>
> So for now, I would say we have to assume that our S3 connector is not
> working. I will start a separate discussion at the developer mailing list
> to remove our S3 connector.
>
> The good news is that you can just use Hadoop's S3 File System
> implementation with Flink.
>
> I used this Flink program to verify its working:
>
> public class S3FileSystem {
>public static void main(String[] args) throws Exception {
>   ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
>   DataSet myLines = 
> ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
>   myLines.print();
>}
> }
>
> also, you need to make a Hadoop configuration file available to Flink.
> When running flink locally in your IDE, just create a "core-site.xml" in
> the src/main/resource folder, with the following content:
>
> 
>
> 
> fs.s3n.awsAccessKeyId
> putKeyHere
> 
>
> 
> fs.s3n.awsSecretAccessKey
> putSecretHere
> 
> 
> fs.s3n.impl
> org.apache.hadoop.fs.s3native.NativeS3FileSystem
> 
> 
>
> Maybe you are running on a cluster, then re-use the existing core-site.xml
> file (= edit it) and point to the directory using Flink's
> fs.hdfs.hadoopconf configuration option.
>
> With these two things in place, you should be good to go.
>
> [1]
> http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
> [2]
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-td946.html
>
> On Mon, Oct 5, 2015 at 8:19 PM, Kostiantyn Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
>> Hi guys,
>>
>> I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
>> data from S3. I tried the following path for data
>> s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
>> exception:
>>
>> java.io.IOException: Cannot establish connection to Amazon S3:
>> com.amazonaws.services.s3.model.AmazonS3Exception: The request signature
>> we calculated does not match the signature you provided. Check your key
>> and signing method. (Service: Amazon S3; Status Code: 403;
>>
>> I added access and secret keys, so the problem is not here. I=92m using
>> standard region and gave read credential to everyone.
>>
>> Any ideas how can it be fixed?
>>
>> Thank you in advance,
>> Kostia
>>
>
>


Re: Processing S3 data with Apache Flink

2015-10-06 Thread Robert Metzger
Mh. I tried out the code I've posted yesterday and it was working
immediately.
The security settings of AWS are sometimes a bit complicated.
I think there are some logs for S3 buckets, maybe they contain some more
information.

Maybe there are other users facing the same issue. Since the S3FileSystem
class is from Hadoop, I suspect the code to be widely used, and you can
probably find answers to the most common problems on google.


On Tue, Oct 6, 2015 at 1:07 PM, KOSTIANTYN Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> Hi Robert,
>
> thank you very much for your input!
>
> Have you tried that?
> With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward,
> and now got a new exception:
>
>
> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
> for '/***.csv' - ResponseCode=403, ResponseMessage=Forbidden
>
> it's really strange as far as I gave full permissions
> to authenticated users and can get target file from s3cmd or s3 browser
> from the same PC... I realize that it's question not to you, but perhaps
> you have faced the same issue
>
> Thanks in advance!
> Kostia
>
> Thank you,
> Konstantin Kudryavtsev
>
> On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger 
> wrote:
>
>> Hi Kostia,
>>
>> thank you for writing to the Flink mailing list. I actually started to
>> try out our S3 File system support after I saw your question on
>> StackOverflow [1].
>> I found that our S3 connector is very broken. I had to resolve two more
>> issues with it, before I was able to get the same exception you reported.
>>
>> Another Flink commiter looked into the issue as well (it was confirmed as
>> well) but there was no solution [2].
>>
>> So for now, I would say we have to assume that our S3 connector is not
>> working. I will start a separate discussion at the developer mailing list
>> to remove our S3 connector.
>>
>> The good news is that you can just use Hadoop's S3 File System
>> implementation with Flink.
>>
>> I used this Flink program to verify its working:
>>
>> public class S3FileSystem {
>>public static void main(String[] args) throws Exception {
>>   ExecutionEnvironment ee = 
>> ExecutionEnvironment.createLocalEnvironment();
>>   DataSet myLines = 
>> ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
>>   myLines.print();
>>}
>> }
>>
>> also, you need to make a Hadoop configuration file available to Flink.
>> When running flink locally in your IDE, just create a "core-site.xml" in
>> the src/main/resource folder, with the following content:
>>
>> 
>>
>> 
>> fs.s3n.awsAccessKeyId
>> putKeyHere
>> 
>>
>> 
>> fs.s3n.awsSecretAccessKey
>> putSecretHere
>> 
>> 
>> fs.s3n.impl
>> org.apache.hadoop.fs.s3native.NativeS3FileSystem
>> 
>> 
>>
>> Maybe you are running on a cluster, then re-use the existing
>> core-site.xml file (= edit it) and point to the directory using Flink's
>> fs.hdfs.hadoopconf configuration option.
>>
>> With these two things in place, you should be good to go.
>>
>> [1]
>> http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
>> [2]
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-td946.html
>>
>> On Mon, Oct 5, 2015 at 8:19 PM, Kostiantyn Kudriavtsev <
>> kudryavtsev.konstan...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
>>> data from S3. I tried the following path for data
>>> s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
>>> exception:
>>>
>>> java.io.IOException: Cannot establish connection to Amazon S3:
>>> com.amazonaws.services.s3.model.AmazonS3Exception: The request signature
>>> we calculated does not match the signature you provided. Check your key
>>> and signing method. (Service: Amazon S3; Status Code: 403;
>>>
>>> I added access and secret keys, so the problem is not here. I=92m using
>>> standard region and gave read credential to everyone.
>>>
>>> Any ideas how can it be fixed?
>>>
>>> Thank you in advance,
>>> Kostia
>>>
>>
>>
>


Re: Processing S3 data with Apache Flink

2015-10-06 Thread KOSTIANTYN Kudriavtsev
Hi Robert,

you are right, I just misspell name of the file :(  Everything works fine!

Basically, I'd suggest to move this workaround into official doc and mark
custom S3FileSystem as @Deprecated...
In fact, I like that idea to mark all untested functional with specific
annotation, for example @Beta. Just because of a big enterprises won't be
like to use any product where documented features don't work. For example,
for me it would be difficult to advocate Flink usage on the project as far
as S3FileSystem was broken and my opponents will refer to that "who knows
what's broken". If some functionality is marked as not properly tested,
it's much easier to make decisions because of better visibility

WBR,
Kostia

Thank you,
Konstantin Kudryavtsev

On Tue, Oct 6, 2015 at 2:12 PM, Robert Metzger  wrote:

> Mh. I tried out the code I've posted yesterday and it was working
> immediately.
> The security settings of AWS are sometimes a bit complicated.
> I think there are some logs for S3 buckets, maybe they contain some more
> information.
>
> Maybe there are other users facing the same issue. Since the S3FileSystem
> class is from Hadoop, I suspect the code to be widely used, and you can
> probably find answers to the most common problems on google.
>
>
> On Tue, Oct 6, 2015 at 1:07 PM, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
>> Hi Robert,
>>
>> thank you very much for your input!
>>
>> Have you tried that?
>> With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward,
>> and now got a new exception:
>>
>>
>> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
>> for '/***.csv' - ResponseCode=403, ResponseMessage=Forbidden
>>
>> it's really strange as far as I gave full permissions
>> to authenticated users and can get target file from s3cmd or s3 browser
>> from the same PC... I realize that it's question not to you, but perhaps
>> you have faced the same issue
>>
>> Thanks in advance!
>> Kostia
>>
>> Thank you,
>> Konstantin Kudryavtsev
>>
>> On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger 
>> wrote:
>>
>>> Hi Kostia,
>>>
>>> thank you for writing to the Flink mailing list. I actually started to
>>> try out our S3 File system support after I saw your question on
>>> StackOverflow [1].
>>> I found that our S3 connector is very broken. I had to resolve two more
>>> issues with it, before I was able to get the same exception you reported.
>>>
>>> Another Flink commiter looked into the issue as well (it was confirmed
>>> as well) but there was no solution [2].
>>>
>>> So for now, I would say we have to assume that our S3 connector is not
>>> working. I will start a separate discussion at the developer mailing list
>>> to remove our S3 connector.
>>>
>>> The good news is that you can just use Hadoop's S3 File System
>>> implementation with Flink.
>>>
>>> I used this Flink program to verify its working:
>>>
>>> public class S3FileSystem {
>>>public static void main(String[] args) throws Exception {
>>>   ExecutionEnvironment ee = 
>>> ExecutionEnvironment.createLocalEnvironment();
>>>   DataSet myLines = 
>>> ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
>>>   myLines.print();
>>>}
>>> }
>>>
>>> also, you need to make a Hadoop configuration file available to Flink.
>>> When running flink locally in your IDE, just create a "core-site.xml" in
>>> the src/main/resource folder, with the following content:
>>>
>>> 
>>>
>>> 
>>> fs.s3n.awsAccessKeyId
>>> putKeyHere
>>> 
>>>
>>> 
>>> fs.s3n.awsSecretAccessKey
>>> putSecretHere
>>> 
>>> 
>>> fs.s3n.impl
>>> org.apache.hadoop.fs.s3native.NativeS3FileSystem
>>> 
>>> 
>>>
>>> Maybe you are running on a cluster, then re-use the existing
>>> core-site.xml file (= edit it) and point to the directory using Flink's
>>> fs.hdfs.hadoopconf configuration option.
>>>
>>> With these two things in place, you should be good to go.
>>>
>>> [1]
>>> http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
>>> [2]
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-td946.html
>>>
>>> On Mon, Oct 5, 2015 at 8:19 PM, Kostiantyn Kudriavtsev <
>>> kudryavtsev.konstan...@gmail.com> wrote:
>>>
 Hi guys,

 I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
 data from S3. I tried the following path for data
 s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
 exception:

 java.io.IOException: Cannot establish connection to Amazon S3:
 com.amazonaws.services.s3.model.AmazonS3Exception: The request
 signature
 we calculated does not match the signature you provided. Check your key
 and signing method. (Service: Amazon S3; Status Code: 403;

 I added access and secret keys, so the problem is not here. I=92m using
 standard region and gave 

Re: Processing S3 data with Apache Flink

2015-10-06 Thread Robert Metzger
Hi Kostia,

I understand your concern. I am going to propose to the Flink developers to
remove the S3 File System support in Flink.
Also, regarding these annotations, we are actually planning to add them for
the 1.0 release so that users know which interfaces they an rely on.

Which other components of Flink are you planning to use?
I can give you some information on how stable/well tested they are.

Usually, everything in Flink is very well tested, but in case of the S3
connector, its hard to do it automatically, because it concerns an external
component out of our control.


Regards,
Robert



On Tue, Oct 6, 2015 at 1:44 PM, KOSTIANTYN Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> Hi Robert,
>
> you are right, I just misspell name of the file :(  Everything works fine!
>
> Basically, I'd suggest to move this workaround into official doc and mark
> custom S3FileSystem as @Deprecated...
> In fact, I like that idea to mark all untested functional with specific
> annotation, for example @Beta. Just because of a big enterprises won't be
> like to use any product where documented features don't work. For example,
> for me it would be difficult to advocate Flink usage on the project as far
> as S3FileSystem was broken and my opponents will refer to that "who knows
> what's broken". If some functionality is marked as not properly tested,
> it's much easier to make decisions because of better visibility
>
> WBR,
> Kostia
>
> Thank you,
> Konstantin Kudryavtsev
>
> On Tue, Oct 6, 2015 at 2:12 PM, Robert Metzger 
> wrote:
>
>> Mh. I tried out the code I've posted yesterday and it was working
>> immediately.
>> The security settings of AWS are sometimes a bit complicated.
>> I think there are some logs for S3 buckets, maybe they contain some more
>> information.
>>
>> Maybe there are other users facing the same issue. Since the S3FileSystem
>> class is from Hadoop, I suspect the code to be widely used, and you can
>> probably find answers to the most common problems on google.
>>
>>
>> On Tue, Oct 6, 2015 at 1:07 PM, KOSTIANTYN Kudriavtsev <
>> kudryavtsev.konstan...@gmail.com> wrote:
>>
>>> Hi Robert,
>>>
>>> thank you very much for your input!
>>>
>>> Have you tried that?
>>> With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward,
>>> and now got a new exception:
>>>
>>>
>>> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
>>> for '/***.csv' - ResponseCode=403, ResponseMessage=Forbidden
>>>
>>> it's really strange as far as I gave full permissions
>>> to authenticated users and can get target file from s3cmd or s3 browser
>>> from the same PC... I realize that it's question not to you, but perhaps
>>> you have faced the same issue
>>>
>>> Thanks in advance!
>>> Kostia
>>>
>>> Thank you,
>>> Konstantin Kudryavtsev
>>>
>>> On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger 
>>> wrote:
>>>
 Hi Kostia,

 thank you for writing to the Flink mailing list. I actually started to
 try out our S3 File system support after I saw your question on
 StackOverflow [1].
 I found that our S3 connector is very broken. I had to resolve two more
 issues with it, before I was able to get the same exception you reported.

 Another Flink commiter looked into the issue as well (it was confirmed
 as well) but there was no solution [2].

 So for now, I would say we have to assume that our S3 connector is not
 working. I will start a separate discussion at the developer mailing list
 to remove our S3 connector.

 The good news is that you can just use Hadoop's S3 File System
 implementation with Flink.

 I used this Flink program to verify its working:

 public class S3FileSystem {
public static void main(String[] args) throws Exception {
   ExecutionEnvironment ee = 
 ExecutionEnvironment.createLocalEnvironment();
   DataSet myLines = 
 ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
   myLines.print();
}
 }

 also, you need to make a Hadoop configuration file available to Flink.
 When running flink locally in your IDE, just create a "core-site.xml"
 in the src/main/resource folder, with the following content:

 

 
 fs.s3n.awsAccessKeyId
 putKeyHere
 

 
 fs.s3n.awsSecretAccessKey
 putSecretHere
 
 
 fs.s3n.impl
 org.apache.hadoop.fs.s3native.NativeS3FileSystem
 
 

 Maybe you are running on a cluster, then re-use the existing
 core-site.xml file (= edit it) and point to the directory using Flink's
 fs.hdfs.hadoopconf configuration option.

 With these two things in place, you should be good to go.

 [1]
 http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
 [2]
 

Re: Processing S3 data with Apache Flink

2015-10-05 Thread Robert Metzger
Hi Kostia,

thank you for writing to the Flink mailing list. I actually started to try
out our S3 File system support after I saw your question on StackOverflow
[1].
I found that our S3 connector is very broken. I had to resolve two more
issues with it, before I was able to get the same exception you reported.

Another Flink commiter looked into the issue as well (it was confirmed as
well) but there was no solution [2].

So for now, I would say we have to assume that our S3 connector is not
working. I will start a separate discussion at the developer mailing list
to remove our S3 connector.

The good news is that you can just use Hadoop's S3 File System
implementation with Flink.

I used this Flink program to verify its working:

public class S3FileSystem {
   public static void main(String[] args) throws Exception {
  ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
  DataSet myLines =
ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
  myLines.print();
   }
}

also, you need to make a Hadoop configuration file available to Flink.
When running flink locally in your IDE, just create a "core-site.xml" in
the src/main/resource folder, with the following content:




fs.s3n.awsAccessKeyId
putKeyHere



fs.s3n.awsSecretAccessKey
putSecretHere


fs.s3n.impl
org.apache.hadoop.fs.s3native.NativeS3FileSystem



Maybe you are running on a cluster, then re-use the existing core-site.xml
file (= edit it) and point to the directory using Flink's
fs.hdfs.hadoopconf configuration option.

With these two things in place, you should be good to go.

[1]
http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
[2]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-td946.html

On Mon, Oct 5, 2015 at 8:19 PM, Kostiantyn Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> Hi guys,
>
> I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
> data from S3. I tried the following path for data
> s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
> exception:
>
> java.io.IOException: Cannot establish connection to Amazon S3:
> com.amazonaws.services.s3.model.AmazonS3Exception: The request signature
> we calculated does not match the signature you provided. Check your key
> and signing method. (Service: Amazon S3; Status Code: 403;
>
> I added access and secret keys, so the problem is not here. I=92m using
> standard region and gave read credential to everyone.
>
> Any ideas how can it be fixed?
>
> Thank you in advance,
> Kostia
>