[jira] [Commented] (HUDI-4859) Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue)

2022-09-16 Thread Angel Conde (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605810#comment-17605810
 ] 

Angel Conde  commented on HUDI-4859:


Started to work on this.

> Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue) 
> 
>
> Key: HUDI-4859
> URL: https://issues.apache.org/jira/browse/HUDI-4859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: docs
>Reporter: Angel Conde 
>Assignee: Angel Conde 
>Priority: Trivial
>  Labels: Blog, docs
> Fix For: 0.13.1
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Hi, 
> After a small contribution about Delta Streamer 
> [https://github.com/apache/hudi/pull/5630#event-7377963882]
> I got the suggestion on writing some some docs/blog on how to use Hudi on 
> Serverless platforms. 
>  
> This tickets is to follow this work. The idea is to publish a new blog with a 
> full example of running Hudi on AWS Glue (one of many "serverless" Spark 
> Platform). 
>  
> Further in the future I would like to contribute with integration with AWS 
> Glue Registry for Delta Streamer :). 
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4859) Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue)

2022-09-16 Thread Angel Conde (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angel Conde  updated HUDI-4859:
---
Status: In Progress  (was: Open)

> Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue) 
> 
>
> Key: HUDI-4859
> URL: https://issues.apache.org/jira/browse/HUDI-4859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: docs
>Reporter: Angel Conde 
>Assignee: Angel Conde 
>Priority: Trivial
>  Labels: Blog, docs
> Fix For: 0.13.1
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Hi, 
> After a small contribution about Delta Streamer 
> [https://github.com/apache/hudi/pull/5630#event-7377963882]
> I got the suggestion on writing some some docs/blog on how to use Hudi on 
> Serverless platforms. 
>  
> This tickets is to follow this work. The idea is to publish a new blog with a 
> full example of running Hudi on AWS Glue (one of many "serverless" Spark 
> Platform). 
>  
> Further in the future I would like to contribute with integration with AWS 
> Glue Registry for Delta Streamer :). 
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4859) Adding a blog on how to run Hudi on Serverless Platforms (AWS Glue)

2022-09-16 Thread Angel Conde (Jira)
Angel Conde  created HUDI-4859:
--

 Summary: Adding a blog on how to run Hudi on Serverless Platforms 
(AWS Glue) 
 Key: HUDI-4859
 URL: https://issues.apache.org/jira/browse/HUDI-4859
 Project: Apache Hudi
  Issue Type: Improvement
  Components: docs
Reporter: Angel Conde 
Assignee: Angel Conde 
 Fix For: 0.13.1


Hi, 

After a small contribution about Delta Streamer 
[https://github.com/apache/hudi/pull/5630#event-7377963882]

I got the suggestion on writing some some docs/blog on how to use Hudi on 
Serverless platforms. 

 

This tickets is to follow this work. The idea is to publish a new blog with a 
full example of running Hudi on AWS Glue (one of many "serverless" Spark 
Platform). 

 

Further in the future I would like to contribute with integration with AWS Glue 
Registry for Delta Streamer :). 

 

Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-29 Thread Angel Conde (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angel Conde  updated HUDI-3994:
---
Fix Version/s: 0.11.0
   (was: 0.12.0)

> HoodieDeltaStreamer - Spark master shouldn't have a default
> ---
>
> Key: HUDI-3994
> URL: https://issues.apache.org/jira/browse/HUDI-3994
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer, spark
>Reporter: Angel Conde 
>Priority: Trivial
>  Labels: easyfix, pull-request-available
> Fix For: 0.11.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark 
> master has no option to inherit from the environment as it defaults to 
> {{{}local[2]{}}}. In these kind of Serverless environments where you do not 
> have access to the master this configuration should be inherited
> This can be seen on line 329 on 
> [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java].
> {{public String sparkMaster = "local[2]";}}
> This should be changed for supporting this kind of scenarios, a 
> JavaSparkContext option where no Spark master is defined should be there.
> *Expected behavior*
> The Spark master shouldn't have a default as there are some environments 
> (usually serverless such as AWS Glue) where it will be inherited.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-29 Thread Angel Conde (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angel Conde  updated HUDI-3994:
---
Status: Patch Available  (was: In Progress)

> HoodieDeltaStreamer - Spark master shouldn't have a default
> ---
>
> Key: HUDI-3994
> URL: https://issues.apache.org/jira/browse/HUDI-3994
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer, spark
>Reporter: Angel Conde 
>Priority: Trivial
>  Labels: easyfix, pull-request-available
> Fix For: 0.12.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark 
> master has no option to inherit from the environment as it defaults to 
> {{{}local[2]{}}}. In these kind of Serverless environments where you do not 
> have access to the master this configuration should be inherited
> This can be seen on line 329 on 
> [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java].
> {{public String sparkMaster = "local[2]";}}
> This should be changed for supporting this kind of scenarios, a 
> JavaSparkContext option where no Spark master is defined should be there.
> *Expected behavior*
> The Spark master shouldn't have a default as there are some environments 
> (usually serverless such as AWS Glue) where it will be inherited.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-29 Thread Angel Conde (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529885#comment-17529885
 ] 

Angel Conde  commented on HUDI-3994:


Created pull request: 

[https://github.com/apache/hudi/pull/5463]

 

 

> HoodieDeltaStreamer - Spark master shouldn't have a default
> ---
>
> Key: HUDI-3994
> URL: https://issues.apache.org/jira/browse/HUDI-3994
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer, spark
>Reporter: Angel Conde 
>Priority: Trivial
>  Labels: easyfix, pull-request-available
> Fix For: 0.12.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark 
> master has no option to inherit from the environment as it defaults to 
> {{{}local[2]{}}}. In these kind of Serverless environments where you do not 
> have access to the master this configuration should be inherited
> This can be seen on line 329 on 
> [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java].
> {{public String sparkMaster = "local[2]";}}
> This should be changed for supporting this kind of scenarios, a 
> JavaSparkContext option where no Spark master is defined should be there.
> *Expected behavior*
> The Spark master shouldn't have a default as there are some environments 
> (usually serverless such as AWS Glue) where it will be inherited.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Angel Conde (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angel Conde  updated HUDI-3994:
---
Component/s: spark

> HoodieDeltaStreamer - Spark master shouldn't have a default
> ---
>
> Key: HUDI-3994
> URL: https://issues.apache.org/jira/browse/HUDI-3994
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer, spark
>Reporter: Angel Conde 
>Priority: Trivial
>  Labels: easyfix
> Fix For: 0.11.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark 
> master has no option to inherit from the environment as it defaults to 
> {{{}local[2]{}}}. In these kind of Serverless environments where you do not 
> have access to the master this configuration should be inherited
> This can be seen on line 329 on 
> [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java].
> {{public String sparkMaster = "local[2]";}}
> This should be changed for supporting this kind of scenarios, a 
> JavaSparkContext option where no Spark master is defined should be there.
> *Expected behavior*
> The Spark master shouldn't have a default as there are some environments 
> (usually serverless such as AWS Glue) where it will be inherited.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Angel Conde (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529535#comment-17529535
 ] 

Angel Conde  commented on HUDI-3994:


Will provide a pull request of this. 

> HoodieDeltaStreamer - Spark master shouldn't have a default
> ---
>
> Key: HUDI-3994
> URL: https://issues.apache.org/jira/browse/HUDI-3994
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Angel Conde 
>Priority: Trivial
>  Labels: easyfix
> Fix For: 0.11.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark 
> master has no option to inherit from the environment as it defaults to 
> {{{}local[2]{}}}. In these kind of Serverless environments where you do not 
> have access to the master this configuration should be inherited
> This can be seen on line 329 on 
> [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java].
> {{public String sparkMaster = "local[2]";}}
> This should be changed for supporting this kind of scenarios, a 
> JavaSparkContext option where no Spark master is defined should be there.
> *Expected behavior*
> The Spark master shouldn't have a default as there are some environments 
> (usually serverless such as AWS Glue) where it will be inherited.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Angel Conde (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angel Conde  updated HUDI-3994:
---
Status: In Progress  (was: Open)

> HoodieDeltaStreamer - Spark master shouldn't have a default
> ---
>
> Key: HUDI-3994
> URL: https://issues.apache.org/jira/browse/HUDI-3994
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Angel Conde 
>Priority: Trivial
>  Labels: easyfix
> Fix For: 0.11.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark 
> master has no option to inherit from the environment as it defaults to 
> {{{}local[2]{}}}. In these kind of Serverless environments where you do not 
> have access to the master this configuration should be inherited
> This can be seen on line 329 on 
> [HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java].
> {{public String sparkMaster = "local[2]";}}
> This should be changed for supporting this kind of scenarios, a 
> JavaSparkContext option where no Spark master is defined should be there.
> *Expected behavior*
> The Spark master shouldn't have a default as there are some environments 
> (usually serverless such as AWS Glue) where it will be inherited.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Angel Conde (Jira)
Angel Conde  created HUDI-3994:
--

 Summary: HoodieDeltaStreamer - Spark master shouldn't have a 
default
 Key: HUDI-3994
 URL: https://issues.apache.org/jira/browse/HUDI-3994
 Project: Apache Hudi
  Issue Type: Improvement
  Components: deltastreamer
Reporter: Angel Conde 
 Fix For: 0.11.0


When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark 
master has no option to inherit from the environment as it defaults to 
{{{}local[2]{}}}. In these kind of Serverless environments where you do not 
have access to the master this configuration should be inherited

This can be seen on line 329 on 
[HoodieDeltaStreamer|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java].

{{public String sparkMaster = "local[2]";}}

This should be changed for supporting this kind of scenarios, a 
JavaSparkContext option where no Spark master is defined should be there.

*Expected behavior*

The Spark master shouldn't have a default as there are some environments 
(usually serverless such as AWS Glue) where it will be inherited.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)