Hi Benjamin, I have done it . The critical configuration items are the ones below :
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", AccessKeyId) ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", AWSSecretAccessKey) val inputS3Stream = ssc.textFileStream("s3://example_bucket/folder") This code will probe for new S3 files created in your every batch interval. Thanks, Natu On Fri, Apr 8, 2016 at 9:14 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > Has anyone monitored an S3 bucket or directory using Spark Streaming and > pulled any new files to process? If so, can you provide basic Scala coding > help on this? > > Thanks, > Ben > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >