Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
XComp merged PR #23836: URL: https://github.com/apache/flink/pull/23836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1896272486 @XComp sure, coming up shortly. Thanks for the review, @afedulov. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
XComp commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1896250037 Sure. Could you provide backport PRs for 1.18 and 1.17? Then I can merge all the necessary PRs at once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
afedulov commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1894316627 Unfortunately there are no definitive guidelines for such edge cases in Flink. That said, I believe the current behavior can be justifiably considered a bug that your PR fixes. The risk that someone might currently rely on the broken behavior and gets affected is there, but the chances are pretty low. LGTM. @XComp could you please merge? (I am still waiting for my ICLA to get stamped) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1887371839 My suspicion is that there's an extremely small chance of that but I have no way to quantify it. In our case it was actually the other way around: we set this option and were reading and writing to GCS, however we were writing in a custom sink that used the GCS SDK directly, so everything worked as expected. It was only when we migrated to using the normal file sink that this issue arose. Does the Flink project have guidelines to approaching fixing this sort of issue, where users may depend on a definitely-broken behavior? If we cannot fix it directly (before Flink 2.0, I assume), do you have a suggestion for another workaround—perhaps just a new config key altogether? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
afedulov commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1883003220 > But yes, the main issue is the inconsistent behavior The purpose is clear and I think it is a good idea to improve it. My only concern is a scenario like this: - A user has a pipeline that reads from some local GCS implementation and writes it into the actual cloud GCS(root GCS url configured via Hadoop). This works because the sink currently ignores this setting and defaults to the Google URL - Merging the proposed change unexpectedly breaks the pipeline because it loops data back into the local store instead of Google GCS. Can we be confident that such does not exist in the wild? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1874157629 I do not think there is much room for unexpected behavior here. The surface area of the change is quite small and follows the existing model for translating a Hadoop config option specific to this filesystem implementation into its native SDK equivalent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1873963500 @afedulov it is quite popular to use alternate endpoint URLs for blob storage services with S3, less so for GCS. I would actually expect it to be more popular, as testing jobs that interact with blob storage without actually having to read/write over the internet is desirable, but it's clearly not so common with Flink else others would have hit this. But yes, the main issue is the inconsistent behavior: Flink relies on Hadoop's filesystem implementations in many cases, and they can be configured using the Hadoop configuration options. However, Flink has chosen to use its own custom implementations in certain cases but has not ensured they can be configured the same way. A user would reasonably expect they can use the Hadoop configuration options to configure this Flink FS impl (which has "Hadoop" in the name), but in fact nearly no configuration options for the write codepath are exposed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
afedulov commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1852263941 @patricklucas My understanding is that one would typically want to override `fs.gs.storage.root.url` only in very specific scenarios, such as the one you mentioned with `fake-gcs-server`. I would like to understand if this change could lead to unexpected consequences for Flink GCS users. It appears that: - Users are likely to have the `fs.gs.storage.root.url` parameter set only in rare cases. - If it is set, they probably expect both the GCS source and GCS sink to use the URL. Is this assessment accurate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
XComp commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1851864223 @afedulov do you have capacity to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1851710390 @XComp before this becomes a dead PR, do you know whom might be a good candidate to ask for a review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1837516482 I considered a slightly different approach, delegating the lookup of the config option to the GCS Hadoop connector itself: https://github.com/apache/flink/compare/master...patricklucas:flink:FLINK-33694_gs_fs_host_v2 If I get some feedback about which of these options is preferred, I can update the PR accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1835801919 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1833953627 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
flinkbot commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1833417452 ## CI report: * da5b66cd0bfca529b8f28869721534b2c375309a UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33694][gs-fs-hadoop] Support overriding GCS root URL [flink]
patricklucas commented on PR #23836: URL: https://github.com/apache/flink/pull/23836#issuecomment-1833404691 Once this is accepted, I am ready with backport branches for 1.18 and 1.17. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org