Re: Monitoring S3 Bucket with Spark Streaming

Benjamin Kim Sat, 09 Apr 2016 18:13:17 -0700

Ah, I spoke too soon.

I thought the SQS part was going to be a spark package. It looks like it has be 
compiled into a jar for use. Am I right? Can someone help with this? I tried to 
compile it using SBT, but I’m stuck with a SonatypeKeys not found error.


If there’s an easier alternative, please let me know.

Thanks,
Ben


> On Apr 9, 2016, at 2:49 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
> 
> This was easy!
> 
> I just created a notification on a source S3 bucket to kick off a Lambda 
> function that would decompress the dropped file and save it to another S3 
> bucket. In return, this S3 bucket has a notification to send a SNS message to 
> me via email. I can just as easily setup SQS to be the endpoint of this 
> notification. This would then convey to a listening Spark Streaming job the 
> file information to download. I like this!
> 
> Cheers,
> Ben 
> 
>> On Apr 9, 2016, at 9:54 AM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> 
>> This is awesome! I have someplace to start from.
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Apr 9, 2016, at 9:45 AM, programminggee...@gmail.com 
>>> <mailto:programminggee...@gmail.com> wrote:
>>> 
>>> Someone please correct me if I am wrong as I am still rather green to 
>>> spark, however it appears that through the S3 notification mechanism 
>>> described below, you can publish events to SQS and use SQS as a streaming 
>>> source into spark. The project at 
>>> https://github.com/imapi/spark-sqs-receiver 
>>> <https://github.com/imapi/spark-sqs-receiver> appears to provide libraries 
>>> for doing this.
>>> 
>>> Hope this helps.
>>> 
>>> Sent from my iPhone
>>> 
>>> On Apr 9, 2016, at 9:55 AM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> 
>>>> Nezih,
>>>> 
>>>> This looks like a good alternative to having the Spark Streaming job check 
>>>> for new files on its own. Do you know if there is a way to have the Spark 
>>>> Streaming job get notified with the new file information and act upon it? 
>>>> This can reduce the overhead and cost of polling S3. Plus, I can use this 
>>>> to notify and kick off Lambda to process new data files and make them 
>>>> ready for Spark Streaming to consume. This will also use notifications to 
>>>> trigger. I just need to have all incoming folders configured for 
>>>> notifications for Lambda and all outgoing folders for Spark Streaming. 
>>>> This sounds like a better setup than we have now.
>>>> 
>>>> Thanks,
>>>> Ben
>>>> 
>>>>> On Apr 9, 2016, at 12:25 AM, Nezih Yigitbasi <nyigitb...@netflix.com 
>>>>> <mailto:nyigitb...@netflix.com>> wrote:
>>>>> 
>>>>> While it is doable in Spark, S3 also supports notifications: 
>>>>> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html 
>>>>> <http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html>
>>>>> 
>>>>> 
>>>>> On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande <nlaucha...@gmail.com 
>>>>> <mailto:nlaucha...@gmail.com>> wrote:
>>>>> Hi Benjamin,
>>>>> 
>>>>> I have done it . The critical configuration items are the ones below :
>>>>> 
>>>>>       ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl", 
>>>>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>>>>>       ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", 
>>>>> AccessKeyId)
>>>>>       
>>>>> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", 
>>>>> AWSSecretAccessKey)
>>>>> 
>>>>>       val inputS3Stream =  ssc.textFileStream("s3://example_bucket/folder 
>>>>> <s3://example_bucket/folder>")
>>>>> 
>>>>> This code will probe for new S3 files created in your every batch 
>>>>> interval.
>>>>> 
>>>>> Thanks,
>>>>> Natu
>>>>> 
>>>>> On Fri, Apr 8, 2016 at 9:14 PM, Benjamin Kim <bbuil...@gmail.com 
>>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>>> Has anyone monitored an S3 bucket or directory using Spark Streaming and 
>>>>> pulled any new files to process? If so, can you provide basic Scala 
>>>>> coding help on this?
>>>>> 
>>>>> Thanks,
>>>>> Ben
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>>>>> <mailto:user-unsubscr...@spark.apache.org>
>>>>> For additional commands, e-mail: user-h...@spark.apache.org 
>>>>> <mailto:user-h...@spark.apache.org>
>>>>> 
>>>>> 
>>>> 
>> 
>

Re: Monitoring S3 Bucket with Spark Streaming

Reply via email to