All,
I have more of a general Scala JSON question.
I have setup a notification on the S3 source bucket that triggers a Lambda
function to unzip the new file placed there. Then, it saves the unzipped CSV
file into another destination bucket where a notification is sent to a SQS
topic. The
Ah, I spoke too soon.
I thought the SQS part was going to be a spark package. It looks like it has be
compiled into a jar for use. Am I right? Can someone help with this? I tried to
compile it using SBT, but I’m stuck with a SonatypeKeys not found error.
If there’s an easier alternative,
This was easy!
I just created a notification on a source S3 bucket to kick off a Lambda
function that would decompress the dropped file and save it to another S3
bucket. In return, this S3 bucket has a notification to send a SNS message to
me via email. I can just as easily setup SQS to be the
why not use AWS Lambda?
Regards,
Gourav
On Fri, Apr 8, 2016 at 8:14 PM, Benjamin Kim wrote:
> Has anyone monitored an S3 bucket or directory using Spark Streaming and
> pulled any new files to process? If so, can you provide basic Scala coding
> help on this?
>
> Thanks,
>
Natu, Benjamin,
With this mechanism you can configure notifications for *buckets* (if you
only care about some key prefixes you can take a look at object key name
filtering, see the docs) for various event types, and then these events
can be published to SNS, SQS or Lambdas. I think using SQS as
Do you know if textFileStream can see if new files are created underneath a
whole bucket?
Only at the level of the folder that you specify . They don't do
subfolders. So your approach would be detecting everything under path
s3://bucket/path/2016040902_data.csv
Also, will Spark Streaming not
This is awesome! I have someplace to start from.
Thanks,
Ben
> On Apr 9, 2016, at 9:45 AM, programminggee...@gmail.com wrote:
>
> Someone please correct me if I am wrong as I am still rather green to spark,
> however it appears that through the S3 notification mechanism described
> below,
Someone please correct me if I am wrong as I am still rather green to spark,
however it appears that through the S3 notification mechanism described below,
you can publish events to SQS and use SQS as a streaming source into spark. The
project at https://github.com/imapi/spark-sqs-receiver
Nezih,
This looks like a good alternative to having the Spark Streaming job check for
new files on its own. Do you know if there is a way to have the Spark Streaming
job get notified with the new file information and act upon it? This can reduce
the overhead and cost of polling S3. Plus, I can
Natu,
Do you know if textFileStream can see if new files are created underneath a
whole bucket? For example, if the bucket name is incoming and new files
underneath it are 2016/04/09/00/00/01/data.csv and
2016/04/09/00/00/02/data/csv, will these files be picked up? Also, will Spark
Streaming
Can you elaborate a bit more in your approach using s3 notifications ? Just
curious. dealing with a similar issue right now that might benefit from
this.
On 09 Apr 2016 9:25 AM, "Nezih Yigitbasi" wrote:
> While it is doable in Spark, S3 also supports notifications:
>
While it is doable in Spark, S3 also supports notifications:
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande wrote:
> Hi Benjamin,
>
> I have done it . The critical configuration items are the ones below
Hi Benjamin,
I have done it . The critical configuration items are the ones below :
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId",
AccessKeyId)
Has anyone monitored an S3 bucket or directory using Spark Streaming and pulled
any new files to process? If so, can you provide basic Scala coding help on
this?
Thanks,
Ben
-
To unsubscribe, e-mail:
14 matches
Mail list logo