Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-12 Thread via GitHub


MartijnVisser commented on code in PR #23489:
URL: https://github.com/apache/flink/pull/23489#discussion_r1424285751


##
flink-filesystems/flink-gs-fs-hadoop/pom.xml:
##
@@ -188,6 +212,29 @@ under the License.
shade


+   
+   
+   
org.apache.flink:flink-fs-hadoop-shaded
+   

+   
com/google/common/**
+   
org/checkerframework/**
+   
com/google/errorprone/**
+   
com/google/j2objc/**
+   
com/google/thirdparty/publicsuffix/**
+   

+   
+   
+   
*
+   

+   
properties.dtd
+   
PropertyList-1.0.dtd
+   
mozilla/**
+   
META-INF/maven/**
+   
META-INF/NOTICE.txt
+   
META-INF/LICENSE.txt
+   

+   
+   

Review Comment:
   We shouldn't exclude these dependencies during shading, but exclude them 
from when we're adding the dependency.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-12 Thread via GitHub


MartijnVisser commented on code in PR #23489:
URL: https://github.com/apache/flink/pull/23489#discussion_r1424292057


##
flink-filesystems/flink-gs-fs-hadoop/pom.xml:
##
@@ -204,20 +251,13 @@ under the License.

org.apache.flink.runtime.util

org.apache.flink.fs.gs.org.apache.flink.runtime.util

+   
+   
+   
com.google.common
+   
org.apache.flink.fs.gs.com.google.common

Review Comment:
   This wouldn't work, right? Because I'm assuming GCS is looking for 
`com.google.common` and not `org.apache.flink.fs` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-12 Thread via GitHub


MartijnVisser commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1852416614

   > Introduced dependency on guava version(`31.1-jre`) required by 
`google-cloud-storage`.
   
   I'm still kind of puzzled. If this is required by `google-cloud-storage`, 
shouldn't it have been bundled as a dependency? Why should Flink have to do 
that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-12 Thread via GitHub


MartijnVisser commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1852784610

   @singhravidutt I've created 
https://github.com/apache/flink/commit/2ebe7f6e090690486c1954099a2b57283c578192 
to test it out on private CI. Let me know what you think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-13 Thread via GitHub


singhravidutt commented on code in PR #23489:
URL: https://github.com/apache/flink/pull/23489#discussion_r1425171966


##
flink-filesystems/flink-gs-fs-hadoop/pom.xml:
##
@@ -188,6 +212,29 @@ under the License.
shade


+   
+   
+   
org.apache.flink:flink-fs-hadoop-shaded
+   

+   
com/google/common/**
+   
org/checkerframework/**
+   
com/google/errorprone/**
+   
com/google/j2objc/**
+   
com/google/thirdparty/publicsuffix/**
+   

+   
+   
+   
*
+   

+   
properties.dtd
+   
PropertyList-1.0.dtd
+   
mozilla/**
+   
META-INF/maven/**
+   
META-INF/NOTICE.txt
+   
META-INF/LICENSE.txt
+   

+   
+   

Review Comment:
   `flink-fs-hadoop-shaded` is a shaded jar which doesn't relocate the classes 
mention in the exclusion rules here. The only way I see we can exclude them is 
by mentioning them explicitly here.
   



##
flink-filesystems/flink-gs-fs-hadoop/pom.xml:
##
@@ -204,20 +251,13 @@ under the License.

org.apache.flink.runtime.util

org.apache.flink.fs.gs.org.apache.flink.runtime.util

+   
+   
+   
com.google.common
+   
org.apache.flink.fs.gs.com.google.common

Review Comment:
   With shading it would rewrite the import:`com.google.common` in classes of 
module:`flink-gs-fs-hadoop`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-13 Thread via GitHub


MartijnVisser commented on code in PR #23489:
URL: https://github.com/apache/flink/pull/23489#discussion_r1425195367


##
flink-filesystems/flink-gs-fs-hadoop/pom.xml:
##
@@ -204,20 +251,13 @@ under the License.

org.apache.flink.runtime.util

org.apache.flink.fs.gs.org.apache.flink.runtime.util

+   
+   
+   
com.google.common
+   
org.apache.flink.fs.gs.com.google.common

Review Comment:
   Yes but the code from GCS still expects the import to be `com.google.common` 
because it doesn't know about the shaded class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-13 Thread via GitHub


MartijnVisser commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1853706439

   @singhravidutt Can I get your thoughts on 
https://github.com/apache/flink/pull/23920 as an alternative solution?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-13 Thread via GitHub


JingGe commented on code in PR #23489:
URL: https://github.com/apache/flink/pull/23489#discussion_r1425851930


##
flink-filesystems/flink-gs-fs-hadoop/pom.xml:
##
@@ -188,6 +212,29 @@ under the License.
shade


+   
+   
+   
org.apache.flink:flink-fs-hadoop-shaded
+   


Review Comment:
   IIUC, we don't need to exclude shaded guava classes since it will not 
disturb if a right guava version dependency is in the classpath.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-13 Thread via GitHub


JingGe commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1854677694

   Hi folks, thanks for driving it.
   
   @MartijnVisser my understanding is that your version won't work, since there 
still a `guava 27.0-jre` in the classpath.
   
   @singhravidutt: https://github.com/apache/flink/pull/23920 from 
@MartijnVisser actually shows the right direction. We don't need to exclude 
guava in `google-cloud-storage` and then build an explicit guava dependency. 
Afaik your version should also work but add extra effort of building guava 
dependency instead of leverage the dependency `google-cloud-storage` already 
has, i.e. if the next version of `google-cloud-storage` needs a new guava 
version, we have to change the guava dependency accordingly again and only a 
few people know it.
   
   I would suggest a solution combine your thoughts plus a small suggestion 
from me:
   
   1. exclude guava in `flink-fs-hadoop-shaded ` thought from @singhravidutt 
   2. remove the new guava dependency created in this pom, thought from 
@MartijnVisser, 1+2 -> make sure there is no guava in the classpath
   3. remove the exclude guava in `google-cloud-storage`, i.e. enable the gcs 
built-in guava dependency -> make sure `google-cloud-storage` has the right 
guava version in the classpath, thought from @MartijnVisser 
   4. remove the exclude guava in `org.apache.flink:flink-fs-hadoop-shaded` as 
I commented above. 
   
   Please correct me if I am wrong. Look forward to your thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-14 Thread via GitHub


singhravidutt commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1855726357

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-14 Thread via GitHub


singhravidutt commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1857360296

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-15 Thread via GitHub


MartijnVisser commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1857461633

   @JingGe Thanks for the comments. I've been looking at different options to 
shade the artifacts, and also try that with a local application that writes to 
GCS. Unfortunately, I haven't gotten that to work yet :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-15 Thread via GitHub


MartijnVisser commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1857672990

   @JingGe @singhravidutt I think I've cracked it :)
   
   1. I've setup a GCS bucket and created a checkpointing test job in order to 
reproduce the error that's happening in Flink 1.18.0. That has worked:
   
   ```
   2023-12-15 11:42:18,119 ERROR 
org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: Thread 
'jobmanager-io-thread-5' produced an uncaught exception. Stopping the process...
   java.lang.NoSuchMethodError: 'com.google.common.collect.ImmutableMap 
com.google.common.collect.ImmutableMap$Builder.buildOrThrow()'
   ```
   
   2. I've tested my PR (which is based on 1.19-SNAPSHOT), with that 
checkpointing works once more:
   
   ```
   2023-12-15 11:32:01,039 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering 
checkpoint 1 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1702636321029 for job 
d9bd270aefde98f6b9aec07f39e6e08b.
   2023-12-15 11:32:01,931 INFO  org.apache.flink.fs.gs.GSFileSystem
  [] - Creating GSRecoverableWriter with file-system options 
GSFileSystemOptions{writerTemporaryBucketName=Optional.empty, 
writerChunkSize=Optional.empty}
   2023-12-15 11:32:03,557 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Completed 
checkpoint 1 for job d9bd270aefde98f6b9aec07f39e6e08b (4201 bytes, 
checkpointDuration=2526 ms, finalizationTime=2 ms).
   2023-12-15 11:32:06,031 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering 
checkpoint 2 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1702636326028 for job 
d9bd270aefde98f6b9aec07f39e6e08b.
   2023-12-15 11:32:06,899 INFO  org.apache.flink.fs.gs.GSFileSystem
  [] - Creating GSRecoverableWriter with file-system options 
GSFileSystemOptions{writerTemporaryBucketName=Optional.empty, 
writerChunkSize=Optional.empty}
   2023-12-15 11:32:08,287 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Completed 
checkpoint 2 for job d9bd270aefde98f6b9aec07f39e6e08b (4201 bytes, 
checkpointDuration=2246 ms, finalizationTime=12 ms).
   ```
   
   I'm currently testing if my patch would also work when backported to 1.18. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-15 Thread via GitHub


MartijnVisser commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1857706468

   @snuyanzin I'm also looping you in here
   
   > I'm currently testing if my patch would also work when backported to 1.18.
   
   Newsflash: I didn't work 😭  Which in hindsight, is quite obvious: `master` 
has 
https://github.com/apache/flink/commit/492a886248208904276fcd2bda138a79c86bc71c 
merged, which influences the dependencies. 
   
   Let me do some more tests and I'll get back here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-15 Thread via GitHub


MartijnVisser commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1857914683

   There are two possibilities to solve this:
   
   1. Backport https://issues.apache.org/jira/browse/FLINK-33704 to 
`release-1.18` - This resolves the issue for 1.18
   2. Merge the commit in my updated PR 
https://github.com/apache/flink/pull/23920/commits/7c609007959cfdc2d3e6bd9634c1576b437a611a
 that excludes Guava from `flink-fs-hadoop-shaded` and includes it, together 
with the needed `com.google.guava:failureaccess` for `flink-gs-fs-hadoop`. 
   
   Both of these approaches make checkpointing work again, I've validated that 
locally. In `master` (so the upcoming Flink 1.19.0 release) this problem does 
not appear.
   
   @JingGe @singhravidutt  @snuyanzin Any preference? I'm leaning towards 1, 
since that would be the same solution for both 1.18 and 1.19, with option 2 
being a specific fix for 1.18 only. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-15 Thread via GitHub


singhravidutt commented on code in PR #23489:
URL: https://github.com/apache/flink/pull/23489#discussion_r1428174024


##
flink-filesystems/flink-gs-fs-hadoop/pom.xml:
##
@@ -188,6 +212,29 @@ under the License.
shade


+   
+   
+   
org.apache.flink:flink-fs-hadoop-shaded
+   


Review Comment:
   ```
   [WARNING] flink-fs-hadoop-shaded-1.19-SNAPSHOT.jar, guava-32.1.2-jre.jar 
define 1837 overlapping classes: 
   [WARNING]   - com.google.common.annotations.Beta
   [WARNING]   - com.google.common.annotations.GwtCompatible
   [WARNING]   - com.google.common.annotations.GwtIncompatible
   [WARNING]   - com.google.common.annotations.VisibleForTesting
   [WARNING]   - com.google.common.base.Absent
   [WARNING]   - com.google.common.base.AbstractIterator
   [WARNING]   - com.google.common.base.AbstractIterator$1
   [WARNING]   - com.google.common.base.AbstractIterator$State
   [WARNING]   - com.google.common.base.Ascii
   [WARNING]   - com.google.common.base.CaseFormat
   [WARNING]   - 1827 more...
   ```
   I see this while building the package. My interpretation if it is that 
because `flink-fs-hadoop-shaded` is shaded jar AND it's not relocating guava 
classes. Shaded jar contains classes of guava. Hence just excluding guava as 
transitive dependency from module:`flink-fs-hadoop-shaded` is not enough.
   
   `flink-gs-fs-hadoop` will contain two implementation of guava classes i.e. 
`com.google.common.*` one coming from `flink-fs-hadoop-shaded` which will be 
from guava version `v27.1` and other from guava `v32.1.2`. As  
`fun:buildOrThrow` is not available in with `v27.` is causes runtime failure.
   
   Hence we have to either repackage every dependency of 
`flink-fs-hadoop-shaded` and then add as a dependency or exclude the jars 
manually.
   
   What are your thoughts on that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-15 Thread via GitHub


singhravidutt commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1858146657

   Licensing issue is fixed in the latest commit but I still feel we should 
exclude guava specific jars coming from `flink-fs-hadoop-shaded`. @JingGe 
@MartijnVisser 
   
   https://github.com/apache/flink/pull/23489#discussion_r1428174024


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-18 Thread via GitHub


MartijnVisser commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1859734539

   Superseded by https://issues.apache.org/jira/browse/FLINK-33704 - Thanks for 
the help/thinking on this @singhravidutt 
   
   @singhravidutt @cnauroth Do you have any bandwidth to add a GCS connector 
test to Flink?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-18 Thread via GitHub


MartijnVisser closed pull request #23489: [ FLINK-33603][FileSystems] shade 
guava in gs-fs filesystem
URL: https://github.com/apache/flink/pull/23489


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [ FLINK-33603][FileSystems] shade guava in gs-fs filesystem [flink]

2023-12-05 Thread via GitHub


singhravidutt commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1840388220

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org