[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Affects Version/s: (was: 2.4.3) 3.0.0 > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a S3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in S3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to S3 every microbatch can be costly. > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to S3 bucket without the need to list all the files every > microbatch. > S3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [Configuring S3 Event > Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] > > Spark can leverage this to find new files written to S3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Description: Using FileStreamSource to read files from a S3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in S3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to S3 every microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to S3 bucket without the need to list all the files every microbatch. S3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring S3 Event Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] Spark can leverage this to find new files written to S3 bucket by reading notifications from SQS queue instead of listing files every microbatch. was: Using FileStreamSource to read files from a S3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in S3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to S3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to S3 bucket without the need to list all the files every microbatch. S3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring S3 Event Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] Spark can leverage this to find new files written to S3 bucket by reading notifications from SQS queue instead of listing files every microbatch. > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a S3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in S3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to S3 every microbatch can be costly. > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to S3 bucket without the need to list all the files every > microbatch. > S3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [Configuring S3 Event > Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] > > Spark can leverage this to find new files written to S3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Description: Using FileStreamSource to read files from a S3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in S3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to S3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to S3 bucket without the need to list all the files every microbatch. S3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring S3 Event Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] Spark can leverage this to find new files written to S3 bucket by reading notifications from SQS queue instead of listing files every microbatch. was: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring s3 Event Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a S3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in S3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to S3 eery microbatch can be costly. > > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to S3 bucket without the need to list all the files every > microbatch. > S3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [Configuring S3 Event > Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] > > Spark can leverage this to find new files written to S3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Description: Using FileStreamSource to read files from a S3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in S3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to S3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to S3 bucket without the need to list all the files every microbatch. S3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring S3 Event Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] Spark can leverage this to find new files written to S3 bucket by reading notifications from SQS queue instead of listing files every microbatch. was: Using FileStreamSource to read files from a S3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in S3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to S3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to S3 bucket without the need to list all the files every microbatch. S3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring S3 Event Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] Spark can leverage this to find new files written to S3 bucket by reading notifications from SQS queue instead of listing files every microbatch. > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a S3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in S3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to S3 eery microbatch can be costly. > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to S3 bucket without the need to list all the files every > microbatch. > S3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [Configuring S3 Event > Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] > > Spark can leverage this to find new files written to S3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Description: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring s3 Event Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. was: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [link title|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a s3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in s3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to s3 eery microbatch can be costly. > > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to s3 bucket without the need to list all the files every > microbatch. > s3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [Configuring s3 Event > Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] > > We can leverage this to find new files written to s3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Description: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [link title|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. was: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring Amazon S3 Event Notifications|[https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html]] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a s3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in s3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to s3 eery microbatch can be costly. > > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to s3 bucket without the need to list all the files every > microbatch. > s3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [link > title|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] > We can leverage this to find new files written to s3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Description: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring Amazon S3 Event Notifications|[https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html]] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. was: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring Amazon S3 Event Notifications| [https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] ] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a s3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in s3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to s3 eery microbatch can be costly. > > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to s3 bucket without the need to list all the files every > microbatch. > s3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [Configuring Amazon S3 Event > Notifications|[https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html]] > We can leverage this to find new files written to s3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Description: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 eery microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring Amazon S3 Event Notifications| [https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] ] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. was: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 e[link title|http://example.com]very microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [ Configuring Amazon S3 Event Notifications | [https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html]] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a s3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in s3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to s3 eery microbatch can be costly. > > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to s3 bucket without the need to list all the files every > microbatch. > s3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [Configuring Amazon S3 Event Notifications| > [https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html] ] > We can leverage this to find new files written to s3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28124) Faster S3 file source with SQS
[ https://issues.apache.org/jira/browse/SPARK-28124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Dixit updated SPARK-28124: --- Description: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 e[link title|http://example.com]very microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [ Configuring Amazon S3 Event Notifications | [https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html]] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. was: Using FileStreamSource to read files from a s3 bucket has problems both in terms of costs and latency: * *Latency:* Listing all the files in s3 buckets every microbatch can be both slow and resource intensive. * *Costs:* Making List API requests to s3 every microbatch can be costly. The solution is to use Amazon Simple Queue Service (SQS) which lets you find new files written to s3 bucket without the need to list all the files every microbatch. s3 buckets can be configured to send notification to an Amazon SQS Queue on Object Create / Object Delete events. For details see AWS documentation here [Configuring Amazon S3 Event Notifications|[https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html]] We can leverage this to find new files written to s3 bucket by reading notifications from SQS queue instead of listing files every microbatch. > Faster S3 file source with SQS > -- > > Key: SPARK-28124 > URL: https://issues.apache.org/jira/browse/SPARK-28124 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > Using FileStreamSource to read files from a s3 bucket has problems both in > terms of costs and latency: > * *Latency:* Listing all the files in s3 buckets every microbatch can be > both slow and resource intensive. > * *Costs:* Making List API requests to s3 e[link > title|http://example.com]very microbatch can be costly. > > The solution is to use Amazon Simple Queue Service (SQS) which lets you find > new files written to s3 bucket without the need to list all the files every > microbatch. > s3 buckets can be configured to send notification to an Amazon SQS Queue on > Object Create / Object Delete events. For details see AWS documentation here > [ Configuring Amazon S3 Event Notifications | > [https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html]] > We can leverage this to find new files written to s3 bucket by reading > notifications from SQS queue instead of listing files every microbatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org