[jira] [Commented] (HUDI-4859) Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue)
[ https://issues.apache.org/jira/browse/HUDI-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605810#comment-17605810 ] Angel Conde commented on HUDI-4859: Started to work on this. > Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue) > > > Key: HUDI-4859 > URL: https://issues.apache.org/jira/browse/HUDI-4859 > Project: Apache Hudi > Issue Type: Improvement > Components: docs >Reporter: Angel Conde >Assignee: Angel Conde >Priority: Trivial > Labels: Blog, docs > Fix For: 0.13.1 > > Original Estimate: 1m > Remaining Estimate: 1m > > Hi, > After a small contribution about Delta Streamer > [https://github.com/apache/hudi/pull/5630#event-7377963882] > I got the suggestion on writing some some docs/blog on how to use Hudi on > Serverless platforms. > > This tickets is to follow this work. The idea is to publish a new blog with a > full example of running Hudi on AWS Glue (one of many "serverless" Spark > Platform). > > Further in the future I would like to contribute with integration with AWS > Glue Registry for Delta Streamer :). > > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4859) Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue)
[ https://issues.apache.org/jira/browse/HUDI-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angel Conde updated HUDI-4859: --- Status: In Progress (was: Open) > Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue) > > > Key: HUDI-4859 > URL: https://issues.apache.org/jira/browse/HUDI-4859 > Project: Apache Hudi > Issue Type: Improvement > Components: docs >Reporter: Angel Conde >Assignee: Angel Conde >Priority: Trivial > Labels: Blog, docs > Fix For: 0.13.1 > > Original Estimate: 1m > Remaining Estimate: 1m > > Hi, > After a small contribution about Delta Streamer > [https://github.com/apache/hudi/pull/5630#event-7377963882] > I got the suggestion on writing some some docs/blog on how to use Hudi on > Serverless platforms. > > This tickets is to follow this work. The idea is to publish a new blog with a > full example of running Hudi on AWS Glue (one of many "serverless" Spark > Platform). > > Further in the future I would like to contribute with integration with AWS > Glue Registry for Delta Streamer :). > > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4859) Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue)
Angel Conde created HUDI-4859: -- Summary: Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue) Key: HUDI-4859 URL: https://issues.apache.org/jira/browse/HUDI-4859 Project: Apache Hudi Issue Type: Improvement Components: docs Reporter: Angel Conde Assignee: Angel Conde Fix For: 0.13.1 Hi, After a small contribution about Delta Streamer [https://github.com/apache/hudi/pull/5630#event-7377963882] I got the suggestion on writing some some docs/blog on how to use Hudi on Serverless platforms. This tickets is to follow this work. The idea is to publish a new blog with a full example of running Hudi on AWS Glue (one of many "serverless" Spark Platform). Further in the future I would like to contribute with integration with AWS Glue Registry for Delta Streamer :). Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angel Conde updated HUDI-3994: --- Fix Version/s: 0.11.0 (was: 0.12.0) > HoodieDeltaStreamer - Spark master shouldn't have a default > --- > > Key: HUDI-3994 > URL: https://issues.apache.org/jira/browse/HUDI-3994 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer, spark >Reporter: Angel Conde >Priority: Trivial > Labels: easyfix, pull-request-available > Fix For: 0.11.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark > master has no option to inherit from the environment as it defaults to > {{{}local[2]{}}}. In these kind of Serverless environments where you do not > have access to the master this configuration should be inherited > This can be seen on line 329 on > [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java]. > {{public String sparkMaster = "local[2]";}} > This should be changed for supporting this kind of scenarios, a > JavaSparkContext option where no Spark master is defined should be there. > *Expected behavior* > The Spark master shouldn't have a default as there are some environments > (usually serverless such as AWS Glue) where it will be inherited. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angel Conde updated HUDI-3994: --- Status: Patch Available (was: In Progress) > HoodieDeltaStreamer - Spark master shouldn't have a default > --- > > Key: HUDI-3994 > URL: https://issues.apache.org/jira/browse/HUDI-3994 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer, spark >Reporter: Angel Conde >Priority: Trivial > Labels: easyfix, pull-request-available > Fix For: 0.12.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark > master has no option to inherit from the environment as it defaults to > {{{}local[2]{}}}. In these kind of Serverless environments where you do not > have access to the master this configuration should be inherited > This can be seen on line 329 on > [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java]. > {{public String sparkMaster = "local[2]";}} > This should be changed for supporting this kind of scenarios, a > JavaSparkContext option where no Spark master is defined should be there. > *Expected behavior* > The Spark master shouldn't have a default as there are some environments > (usually serverless such as AWS Glue) where it will be inherited. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529885#comment-17529885 ] Angel Conde commented on HUDI-3994: Created pull request: [https://github.com/apache/hudi/pull/5463] > HoodieDeltaStreamer - Spark master shouldn't have a default > --- > > Key: HUDI-3994 > URL: https://issues.apache.org/jira/browse/HUDI-3994 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer, spark >Reporter: Angel Conde >Priority: Trivial > Labels: easyfix, pull-request-available > Fix For: 0.12.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark > master has no option to inherit from the environment as it defaults to > {{{}local[2]{}}}. In these kind of Serverless environments where you do not > have access to the master this configuration should be inherited > This can be seen on line 329 on > [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java]. > {{public String sparkMaster = "local[2]";}} > This should be changed for supporting this kind of scenarios, a > JavaSparkContext option where no Spark master is defined should be there. > *Expected behavior* > The Spark master shouldn't have a default as there are some environments > (usually serverless such as AWS Glue) where it will be inherited. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angel Conde updated HUDI-3994: --- Component/s: spark > HoodieDeltaStreamer - Spark master shouldn't have a default > --- > > Key: HUDI-3994 > URL: https://issues.apache.org/jira/browse/HUDI-3994 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer, spark >Reporter: Angel Conde >Priority: Trivial > Labels: easyfix > Fix For: 0.11.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark > master has no option to inherit from the environment as it defaults to > {{{}local[2]{}}}. In these kind of Serverless environments where you do not > have access to the master this configuration should be inherited > This can be seen on line 329 on > [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java]. > {{public String sparkMaster = "local[2]";}} > This should be changed for supporting this kind of scenarios, a > JavaSparkContext option where no Spark master is defined should be there. > *Expected behavior* > The Spark master shouldn't have a default as there are some environments > (usually serverless such as AWS Glue) where it will be inherited. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529535#comment-17529535 ] Angel Conde commented on HUDI-3994: Will provide a pull request of this. > HoodieDeltaStreamer - Spark master shouldn't have a default > --- > > Key: HUDI-3994 > URL: https://issues.apache.org/jira/browse/HUDI-3994 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Angel Conde >Priority: Trivial > Labels: easyfix > Fix For: 0.11.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark > master has no option to inherit from the environment as it defaults to > {{{}local[2]{}}}. In these kind of Serverless environments where you do not > have access to the master this configuration should be inherited > This can be seen on line 329 on > [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java]. > {{public String sparkMaster = "local[2]";}} > This should be changed for supporting this kind of scenarios, a > JavaSparkContext option where no Spark master is defined should be there. > *Expected behavior* > The Spark master shouldn't have a default as there are some environments > (usually serverless such as AWS Glue) where it will be inherited. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angel Conde updated HUDI-3994: --- Status: In Progress (was: Open) > HoodieDeltaStreamer - Spark master shouldn't have a default > --- > > Key: HUDI-3994 > URL: https://issues.apache.org/jira/browse/HUDI-3994 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Angel Conde >Priority: Trivial > Labels: easyfix > Fix For: 0.11.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark > master has no option to inherit from the environment as it defaults to > {{{}local[2]{}}}. In these kind of Serverless environments where you do not > have access to the master this configuration should be inherited > This can be seen on line 329 on > [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java]. > {{public String sparkMaster = "local[2]";}} > This should be changed for supporting this kind of scenarios, a > JavaSparkContext option where no Spark master is defined should be there. > *Expected behavior* > The Spark master shouldn't have a default as there are some environments > (usually serverless such as AWS Glue) where it will be inherited. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default
Angel Conde created HUDI-3994: -- Summary: HoodieDeltaStreamer - Spark master shouldn't have a default Key: HUDI-3994 URL: https://issues.apache.org/jira/browse/HUDI-3994 Project: Apache Hudi Issue Type: Improvement Components: deltastreamer Reporter: Angel Conde Fix For: 0.11.0 When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark master has no option to inherit from the environment as it defaults to {{{}local[2]{}}}. In these kind of Serverless environments where you do not have access to the master this configuration should be inherited This can be seen on line 329 on [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java]. {{public String sparkMaster = "local[2]";}} This should be changed for supporting this kind of scenarios, a JavaSparkContext option where no Spark master is defined should be there. *Expected behavior* The Spark master shouldn't have a default as there are some environments (usually serverless such as AWS Glue) where it will be inherited. -- This message was sent by Atlassian Jira (v8.20.7#820007)