Re: Setup of Scala/Flink project using Bazel
Hi Austin, In the end I added the following target override for Scala: ``` maven_install( artifacts = [ # testing maven.artifact( group = "com.google.truth", artifact = "truth", version = "1.0.1", ), ] + flink_artifacts( addons = FLINK_ADDONS, scala_version = FLINK_SCALA_VERSION, version = FLINK_VERSION, ) + flink_testing_artifacts( scala_version = FLINK_SCALA_VERSION, version = FLINK_VERSION, ), fetch_sources = True, # This override results in Scala-related classes being removed from the deploy jar as required (?) override_targets = { "org.scala-lang.scala-library": "@io_bazel_rules_scala_scala_library//:io_bazel_rules_scala_scala_library", "org.scala-lang.scala-reflect": "@io_bazel_rules_scala_scala_reflect//:io_bazel_rules_scala_scala_reflect", "org.scala-lang.scala-compiler": "@io_bazel_rules_scala_scala_compiler//:io_bazel_rules_scala_scala_compiler", "org.scala-lang.modules.scala-parser-combinators_%s" % FLINK_SCALA_VERSION: "@io_bazel_rules_scala_scala_parser_combinators//:io_bazel_rules_scala_scala_parser_combinators", "org.scala-lang.modules.scala-xml_%s" % FLINK_SCALA_VERSION: "@io_bazel_rules_scala_scala_xml//:io_bazel_rules_scala_scala_xml", }, repositories = MAVEN_REPOSITORIES, ) ``` and now it works as expected, meaning: ``` bazel build //src/main/scala/org/example:word_count_deploy.jar ``` produces a jar with both Flink and Scala-related classes removed (since they are provided by the runtime). I did a quick check and the flink job runs just fine in a local cluster. It would be nice if the community could confirm that this is indeed the way to build flink-based scala applications... BTW I updated the repo with the abovementioned override: https://github.com/salvalcantara/bazel-flink-scala in case you want to give it a try -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Setup of Scala/Flink project using Bazel
That would be awesome Austin, thanks again for your help on that. In the meantime, I also filled an issue in the `rules_scala` repo: https://github.com/bazelbuild/rules_scala/issues/1268. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Setup of Scala/Flink project using Bazel
I know @Aaron Levin is using `rules_scala` for building Flink apps, perhaps he can help us out here (and hope he doesn't mind the ping). On Wed, May 12, 2021 at 4:13 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Yikes, I see what you mean. I also can not get `neverlink` or adding the > org.scala.lang artifacts to the deploy_env to remove them from the uber jar. > > I'm not super familiar with sbt/ scala, but do you know how exactly the > assembly `includeScala` works? Is it just a flag that is passed to scalac? > > I've found where rules_scala defines how to call `scalac`, but am lost > here[1]. > > Best, > Austin > > [1]: > https://github.com/bazelbuild/rules_scala/blob/c9cc7c261d3d740eb91ef8ef048b7cd2229d12ec/scala/private/rule_impls.bzl#L72-L139 > > On Wed, May 12, 2021 at 3:20 PM Salva Alcántara > wrote: > >> Hi Austin, >> >> Yep, removing Flink dependencies is working well as you pointed out. >> >> The problem now is that I would also need to remove the scala library...by >> inspecting the jar you will see a lot of scala-related classes. If you >> take >> a look at the end of the build.sbt file, I have >> >> ``` >> // exclude Scala library from assembly >> assembly / assemblyOption := (assembly / >> assemblyOption).value.copy(includeScala = false) >> ``` >> >> so the fat jar generated by running `sbt assembly` does not contain >> scala-related classes, which are also "provided". You can compare the >> bazel-built jar with the one built by sbt >> >> ``` >> $ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar >> META-INF/MANIFEST.MF >> org/ >> org/example/ >> BUILD >> log4j.properties >> org/example/WordCount$$anon$1$$anon$2.class >> org/example/WordCount$$anon$1.class >> org/example/WordCount$.class >> org/example/WordCount.class >> ``` >> >> Note that there are neither Flink nor Scala classes. In the jar generated >> by >> bazel, however, I can still see Scala classes... >> >> >> >> -- >> Sent from: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> >
Re: Setup of Scala/Flink project using Bazel
Yikes, I see what you mean. I also can not get `neverlink` or adding the org.scala.lang artifacts to the deploy_env to remove them from the uber jar. I'm not super familiar with sbt/ scala, but do you know how exactly the assembly `includeScala` works? Is it just a flag that is passed to scalac? I've found where rules_scala defines how to call `scalac`, but am lost here[1]. Best, Austin [1]: https://github.com/bazelbuild/rules_scala/blob/c9cc7c261d3d740eb91ef8ef048b7cd2229d12ec/scala/private/rule_impls.bzl#L72-L139 On Wed, May 12, 2021 at 3:20 PM Salva Alcántara wrote: > Hi Austin, > > Yep, removing Flink dependencies is working well as you pointed out. > > The problem now is that I would also need to remove the scala library...by > inspecting the jar you will see a lot of scala-related classes. If you take > a look at the end of the build.sbt file, I have > > ``` > // exclude Scala library from assembly > assembly / assemblyOption := (assembly / > assemblyOption).value.copy(includeScala = false) > ``` > > so the fat jar generated by running `sbt assembly` does not contain > scala-related classes, which are also "provided". You can compare the > bazel-built jar with the one built by sbt > > ``` > $ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar > META-INF/MANIFEST.MF > org/ > org/example/ > BUILD > log4j.properties > org/example/WordCount$$anon$1$$anon$2.class > org/example/WordCount$$anon$1.class > org/example/WordCount$.class > org/example/WordCount.class > ``` > > Note that there are neither Flink nor Scala classes. In the jar generated > by > bazel, however, I can still see Scala classes... > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
Re: Setup of Scala/Flink project using Bazel
Hi Austin, Yep, removing Flink dependencies is working well as you pointed out. The problem now is that I would also need to remove the scala library...by inspecting the jar you will see a lot of scala-related classes. If you take a look at the end of the build.sbt file, I have ``` // exclude Scala library from assembly assembly / assemblyOption := (assembly / assemblyOption).value.copy(includeScala = false) ``` so the fat jar generated by running `sbt assembly` does not contain scala-related classes, which are also "provided". You can compare the bazel-built jar with the one built by sbt ``` $ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar META-INF/MANIFEST.MF org/ org/example/ BUILD log4j.properties org/example/WordCount$$anon$1$$anon$2.class org/example/WordCount$$anon$1.class org/example/WordCount$.class org/example/WordCount.class ``` Note that there are neither Flink nor Scala classes. In the jar generated by bazel, however, I can still see Scala classes... -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Setup of Scala/Flink project using Bazel
Hi Salva, I think you're almost there. Confusion is definitely not helped by the ADDONS/ PROVIDED_ADDONS thingy – I think I tried to get too fancy with that in the linked thread. I think the only thing you have to do differently is to adjust the target you are building/ deploying – instead of `//src/main/scala/org/example:flink_app_deploy.jar`, your target with the provided env applied is `//src/main/scala/org/example:word_count_deploy.jar`. I've verified this in the following ways: 1. Building and checking the JAR itself * bazel build //src/main/scala/org/example:word_count_deploy.jar * jar -tf bazel-bin/src/main/scala/org/example/word_count_deploy.jar | grep flink * Shows only the tools/flink/NoOp class 2. Running the word count jar locally, to ensure the main class is picked up correctly: ./bazel-bin/src/main/scala/org/example/word_count USAGE: WordCount 3. Had fun with the Bazel query language[1], inspecting the difference in the dependencies between the deploy env and the word_cound_deploy.jar: bazel query 'filter("@maven//:org_apache_flink.*", deps(//src/main/scala/org/example:word_count_deploy.jar) except deps(//:default_flink_deploy_env))' INFO: Empty results Loading: 0 packages loaded This is to say that there are no Flink dependencies in the deploy JAR that are not accounted for in the deploy env. So I think you're all good, but let me know if I've misunderstood! Or if you find a better way of doing the provided deps – I'd be very interested! Best, Austin [1]: https://docs.bazel.build/versions/master/query.htm On Wed, May 12, 2021 at 10:28 AM Salva Alcántara wrote: > Hi Austin, > > I followed your instructions and gave `rules_jvm_external` a try. > > Overall, I think I advanced a bit, but I'm not quite there yet. I have > followed the link [1] given by Matthias, making the necessary changes to my > repo: > > https://github.com/salvalcantara/bazel-flink-scala > > In particular, the relevant (bazel) BUILD file looks like this: > > ``` > package(default_visibility = ["//visibility:public"]) > > load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library", > "scala_test") > > filegroup( > name = "scala-main-srcs", > srcs = glob(["*.scala"]), > ) > > scala_library( > name = "flink_app", > srcs = [":scala-main-srcs"], > deps = [ > "@maven//:org_apache_flink_flink_core", > "@maven//:org_apache_flink_flink_clients_2_12", > "@maven//:org_apache_flink_flink_scala_2_12", > "@maven//:org_apache_flink_flink_streaming_scala_2_12", > "@maven//:org_apache_flink_flink_streaming_java_2_12", > ], > ) > > java_binary( > name = "word_count", > srcs = ["//tools/flink:noop"], > deploy_env = ["//:default_flink_deploy_env"], > main_class = "org.example.WordCount", > deps = [ > ":flink_app", > ], > ) > ``` > > The idea is to use `deploy_env` within `java_binary` for providing the > flink > dependencies. This causes those dependencies to get removed from the final > fat jar that one gets by running: > > ``` > bazel build //src/main/scala/org/example:flink_app_deploy.jar > ``` > > The problem now is that the jar still includes the Scala library, which > should also be dropped from the jar as it is part of the provided > dependencies within the Flink cluster. I am reading this blog post in [2] > without luck yet... > > Regards, > > Salva > > [1] > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-anyone-have-an-example-of-Bazel-working-with-Flink-td35898.html > > [2] > > https://yishanhe.net/address-dependency-conflict-for-bazel-built-scala-spark/ > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
Re: Setup of Scala/Flink project using Bazel
Hi Austin, I followed your instructions and gave `rules_jvm_external` a try. Overall, I think I advanced a bit, but I'm not quite there yet. I have followed the link [1] given by Matthias, making the necessary changes to my repo: https://github.com/salvalcantara/bazel-flink-scala In particular, the relevant (bazel) BUILD file looks like this: ``` package(default_visibility = ["//visibility:public"]) load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library", "scala_test") filegroup( name = "scala-main-srcs", srcs = glob(["*.scala"]), ) scala_library( name = "flink_app", srcs = [":scala-main-srcs"], deps = [ "@maven//:org_apache_flink_flink_core", "@maven//:org_apache_flink_flink_clients_2_12", "@maven//:org_apache_flink_flink_scala_2_12", "@maven//:org_apache_flink_flink_streaming_scala_2_12", "@maven//:org_apache_flink_flink_streaming_java_2_12", ], ) java_binary( name = "word_count", srcs = ["//tools/flink:noop"], deploy_env = ["//:default_flink_deploy_env"], main_class = "org.example.WordCount", deps = [ ":flink_app", ], ) ``` The idea is to use `deploy_env` within `java_binary` for providing the flink dependencies. This causes those dependencies to get removed from the final fat jar that one gets by running: ``` bazel build //src/main/scala/org/example:flink_app_deploy.jar ``` The problem now is that the jar still includes the Scala library, which should also be dropped from the jar as it is part of the provided dependencies within the Flink cluster. I am reading this blog post in [2] without luck yet... Regards, Salva [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-anyone-have-an-example-of-Bazel-working-with-Flink-td35898.html [2] https://yishanhe.net/address-dependency-conflict-for-bazel-built-scala-spark/ -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Setup of Scala/Flink project using Bazel
Great! Feel free to post back if you run into anything else or come up with a nice template – I agree it would be a nice thing for the community to have. Best, Austin On Tue, May 4, 2021 at 12:37 AM Salva Alcántara wrote: > Hey Austin, > > There was no special reason for vendoring using `bazel-deps`, really. I > just > took another project as a reference for mine and that project was already > using `bazel-deps`. I am going to give `rules_jvm_external` a try, and > hopefully I can make it work! > > Regards, > > Salva > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
Re: Setup of Scala/Flink project using Bazel
Hey Austin, There was no special reason for vendoring using `bazel-deps`, really. I just took another project as a reference for mine and that project was already using `bazel-deps`. I am going to give `rules_jvm_external` a try, and hopefully I can make it work! Regards, Salva -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Setup of Scala/Flink project using Bazel
Hey Salva, This appears to be a bug in the `bazel-deps` tool, caused by mixing scala and Java dependencies. The tool seems to use the same target name for both, and thus produces duplicate targets (one for scala and one for java). If you look at the dict lines that are reported as conflicting, you'll see the duplicate "vendor/org/apache/flink:flink_clients" target: *"vendor/org/apache/flink:flink_clients": ["lang||java","name||//vendor/org/apache/flink:flink_clients",* ...], *"vendor/org/apache/flink:flink_clients": ["lang||scala:2.12.11","name||//vendor/org/apache/flink:flink_clients", *...], Can I ask what made you choose the `bazel-deps` too instead of the official bazelbuild/rules_jvm_external[1]? That might be a bit more verbose, but has better support and supports scala as well. Alternatively, you might look into customizing the target templates for `bazel-deps` to suffix targets with the lang? Something like: _JAVA_LIBRARY_TEMPLATE = """ java_library( name = "{name}_java", ...""" _SCALA_IMPORT_TEMPLATE = """ scala_import( name = "{name}_scala", ...""" Best, Austin [1]: https://github.com/bazelbuild/rules_jvm_external On Mon, May 3, 2021 at 1:20 PM Salva Alcántara wrote: > Hi Matthias, > > Thanks a lot for your reply. I am already aware of that reference, but it's > not exactly what I need. What I'd like to have is the typical word count > (hello world) app migrated from sbt to bazel, in order to use it as a > template for my Flink/Scala apps. > > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
Re: Setup of Scala/Flink project using Bazel
Hi Matthias, Thanks a lot for your reply. I am already aware of that reference, but it's not exactly what I need. What I'd like to have is the typical word count (hello world) app migrated from sbt to bazel, in order to use it as a template for my Flink/Scala apps. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Setup of Scala/Flink project using Bazel
ming-scala] # provided > flink-connector-kafka: > lang: java > version: "0.10.2" > flink-test-utils: > lang: java > version: "0.10.2" > ``` > > For downloading the dependencies, I'm running > > ``` > bazel run //:parse generate -- --repo-root ~/Projects/bazel-flink-scala > --sha-file vendor/workspace.bzl --target-file vendor/target_file.bzl --deps > dependencies.yaml > ``` > > Which runs just fine, but then when I try to build the project > > ``` > bazel build //:job > ``` > > I'm getting this error > > ``` > Starting local Bazel server and connecting to it... > ERROR: Traceback (most recent call last): > File > "/Users/salvalcantara/Projects/me/bazel-flink-scala/WORKSPACE", line > 44, column 25, in > build_external_workspace(name = "vendor") > File > > "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl", > line 258, column 91, in build_external_workspace > return build_external_workspace_from_opts(name = name, > target_configs = > list_target_data(), separator = list_target_data_separator(), build_header > = > build_header()) > File > > "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl", > line 251, column 40, in list_target_data > "vendor/org/apache/flink:flink_clients": > > ["lang||scala:2.12.11","name||//vendor/org/apache/flink:flink_clients","visibility||//visibility:public","kind||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"], > Error: dictionary expression has duplicate key: > "vendor/org/apache/flink:flink_clients" > ERROR: error loading package 'external': Package 'external' contains errors > INFO: Elapsed time: 3.644s > INFO: 0 processes. > FAILED: Build did NOT complete successfully (0 packages loaded) > ``` > > Why is that? Anyone can help? It would be great having detailed > instructions > and project templates for Flink/Scala applications using Bazel. I've put > everything together in the following repo: > https://github.com/salvalcantara/bazel-flink-scala, feel free to send a PR > or whatever. > > PS: Also posted in SO: > > https://stackoverflow.com/questions/67331792/setup-of-scala-flink-project-using-bazel > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Setup of Scala/Flink project using Bazel
ile "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl", line 251, column 40, in list_target_data "vendor/org/apache/flink:flink_clients": ["lang||scala:2.12.11","name||//vendor/org/apache/flink:flink_clients","visibility||//visibility:public","kind||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"], Error: dictionary expression has duplicate key: "vendor/org/apache/flink:flink_clients" ERROR: error loading package 'external': Package 'external' contains errors INFO: Elapsed time: 3.644s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded) ``` Why is that? Anyone can help? It would be great having detailed instructions and project templates for Flink/Scala applications using Bazel. I've put everything together in the following repo: https://github.com/salvalcantara/bazel-flink-scala, feel free to send a PR or whatever. PS: Also posted in SO: https://stackoverflow.com/questions/67331792/setup-of-scala-flink-project-using-bazel -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/