[jira] [Commented] (HUDI-914) support different target data clusters

2020-06-15 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135827#comment-17135827
 ] 

Vinoth Chandar commented on HUDI-914:
-

I see,, clear now. Thanks!

it seems like a very valid use-case .

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-914) support different target data clusters

2020-06-12 Thread liujinhui (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134637#comment-17134637
 ] 

liujinhui commented on HUDI-914:


The deltastreamer task always runs on a certain cluster, but the deltastreamer 
should be able to write the hudi table to any cluster, including data 
synchronization to hive
[~vinoth]

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-914) support different target data clusters

2020-06-11 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133406#comment-17133406
 ] 

Vinoth Chandar commented on HUDI-914:
-

so they want to split a hudi table across two spark/yarn clusters or say two 
hdfs clusters/two s3 buckets? Can you explain more please? 

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-914) support different target data clusters

2020-06-08 Thread liujinhui (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128757#comment-17128757
 ] 

liujinhui commented on HUDI-914:


Due to the needs of some business parties, they only want the hudi dataset to 
appear on their clusters, and they do not want to pay attention to specific 
tasks
[~vinoth]

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-914) support different target data clusters

2020-05-20 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112128#comment-17112128
 ] 

Vinoth Chandar commented on HUDI-914:
-

For my understanding, whats a specific scenario where you cannot run on the 
target cluster, but have to run Hudi writing off another clusteR? 

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-914) support different target data clusters

2020-05-20 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112126#comment-17112126
 ] 

Vinoth Chandar commented on HUDI-914:
-

>Although specifying the namenode IP address of the target cluster can be 
>written, this loses HDFS high availability 
I think you are referring to the fact that the other configs for HA NameNode 
won't be picked up for e.g? 

I think having a way to explicitly pick up configuration for target cluster in 
delta streamer and data source (IIUC you will just be augmenting the 
sparkContext with this additional configurations) is a good addition.. 

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-914) support different target data clusters

2020-05-19 Thread liujinhui (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111259#comment-17111259
 ] 

liujinhui commented on HUDI-914:


[~vinothchandar]

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-914) support different target data clusters

2020-05-19 Thread liujinhui (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111257#comment-17111257
 ] 

liujinhui commented on HUDI-914:


[~yanghua] 

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)