luoyuxia commented on code in PR #2555:
URL: https://github.com/apache/fluss/pull/2555#discussion_r2758804785


##########
website/docs/quickstart/lakehouse.md:
##########


Review Comment:
   I'll also encounter parquet related conflict, after shade parquet, I can 
resolve this problem:
   ```
   <relocation>
   <pattern>org.apache.parquet</pattern>                                  
<shadedPattern>org.apache.iceberg.shaded.org.apache.parquet</shadedPattern>
   </relocation>
   ```



##########
website/docs/quickstart/lakehouse.md:
##########


Review Comment:
   we can remove this line since it's already in `streaming` mode



##########
website/docs/quickstart/lakehouse.md:
##########


Review Comment:
   May in the next release, we'll consider not bundle iceberg-releated class in 
`fluss-lake-fluss` just like what we do for paimon in #2531 



##########
website/docs/quickstart/lakehouse.md:
##########
@@ -155,37 +155,60 @@ mkdir fluss-quickstart-iceberg
 cd fluss-quickstart-iceberg
 ```
 
-2. Create a `lib` directory and download the required Hadoop jar file:
+2. Create directories and download required jars:
 
 ```shell
-mkdir lib
-wget -O lib/hadoop-apache-3.3.5-2.jar 
https://repo1.maven.org/maven2/io/trino/hadoop/hadoop-apache/3.3.5-2/hadoop-apache-3.3.5-2.jar
-```
+mkdir -p lib opt
 
-This jar file provides Hadoop 3.3.5 dependencies required for Iceberg's Hadoop 
catalog integration.
+# Flink connectors
+wget -O lib/flink-faker.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar

Review Comment:
   Can we still keep the version in the file naming so that it's easy for user 
to track



##########
website/docs/quickstart/lakehouse.md:
##########
@@ -199,9 +222,8 @@ services:
         datalake.iceberg.warehouse: /tmp/iceberg
     volumes:
       - shared-tmpfs:/tmp/iceberg

Review Comment:
   from flink 0.9, we will also need mount `/tmp/fluss`:
   ```
   shared-tmpfs:/tmp/fluss
   ```



##########
website/docs/quickstart/lakehouse.md:
##########
@@ -220,9 +242,11 @@ services:
         datalake.iceberg.warehouse: /tmp/iceberg
     volumes:
       - shared-tmpfs:/tmp/iceberg

Review Comment:
   from flink 0.9, we will also need mount `/tmp/fluss`:
   ```
   shared-tmpfs:/tmp/fluss
   ```



##########
website/docs/quickstart/lakehouse.md:
##########


Review Comment:
   I encounter
   ```
   inherit an implementation of the resolved method 'abstract void 
generate(com.fasterxml.jackson.core.JsonGenerator)' of interface 
org.apache.iceberg.util.JsonUtil$ToJson.
   ```
   when fire this sql:
   
   The reason is that `iceberg-flink` introduce `JsonUtil` which already shade 
`com.fasterxml.jackson` to `org.apache.iceberg.shaded.com.fasterxml.jackson`.
   `fluss-lake-iceberg` also introduce `JsonUtil` which doesn't shade 
`com.fasterxml.jackson`, so the class conflict happens.
   
   To solve it, we need to shade `com.fasterxml.jackson` in our 
`fluss-lake-iceberg` module:
   ```
   <configuration>
                               <artifactSet>
                                   <includes>
                                       <include>*:*</include>
                                   </includes>
                               </artifactSet>
                               <relocations>
                                   <relocation>
                                       <pattern>com.fasterxml.jackson</pattern>
                                       
<shadedPattern>org.apache.iceberg.shaded.com.fasterxml.jackson</shadedPattern>
                                   </relocation>
                                   <relocation>
                                       <pattern>org.apache.parquet</pattern>
                                       
<shadedPattern>org.apache.iceberg.shaded.org.apache.parquet</shadedPattern>
                                   </relocation>
                               </relocations>
                               <filters>
                                   <filter>
                                       <artifact>*</artifact>
                                       <excludes>
                                           <exclude>LICENSE</exclude>
                                           <exclude>NOTICE</exclude>
                                           
<exclude>META-INF/versions/21/**</exclude>
                                           
<exclude>META-INF/versions/17/**</exclude>
                                       </excludes>
                                   </filter>
                               </filters>
                           </configuration>
   ```
   
   
   



##########
website/docs/quickstart/lakehouse.md:
##########
@@ -155,37 +155,60 @@ mkdir fluss-quickstart-iceberg
 cd fluss-quickstart-iceberg
 ```
 
-2. Create a `lib` directory and download the required Hadoop jar file:
+2. Create directories and download required jars:
 
 ```shell
-mkdir lib
-wget -O lib/hadoop-apache-3.3.5-2.jar 
https://repo1.maven.org/maven2/io/trino/hadoop/hadoop-apache/3.3.5-2/hadoop-apache-3.3.5-2.jar
-```
+mkdir -p lib opt
 
-This jar file provides Hadoop 3.3.5 dependencies required for Iceberg's Hadoop 
catalog integration.
+# Flink connectors
+wget -O lib/flink-faker.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+wget -O lib/fluss-flink.jar 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar";
+wget -O lib/iceberg-flink-runtime.jar 
"https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.10.1/iceberg-flink-runtime-1.20-1.10.1.jar";
 
-:::info
-The `lib` directory serves as a staging area for additional jars needed by the 
Fluss coordinator server. The docker-compose configuration (see step 3) mounts 
this directory and copies all jars to `/opt/fluss/plugins/iceberg/` inside the 
coordinator container at startup.
+# Fluss lake plugin
+wget -O lib/fluss-lake-iceberg.jar 
https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-iceberg/$FLUSS_DOCKER_VERSION$/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar
+
+# Hadoop filesystem support
+wget -O lib/hadoop-apache.jar 
https://repo1.maven.org/maven2/io/trino/hadoop/hadoop-apache/3.3.5-2/hadoop-apache-3.3.5-2.jar
+wget -O lib/failsafe.jar 
https://repo1.maven.org/maven2/dev/failsafe/failsafe/3.3.2/failsafe-3.3.2.jar
 
+# Tiering service
+wget -O opt/fluss-flink-tiering.jar 
https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar
+```
+
+:::info
 You can add more jars to this `lib` directory based on your requirements:
 - **Cloud storage support**: For AWS S3 integration with Iceberg, add the 
corresponding Iceberg bundle jars (e.g., `iceberg-aws-bundle`)
 - **Custom Hadoop configurations**: Add jars for specific HDFS distributions 
or custom authentication mechanisms
 - **Other catalog backends**: Add jars needed for alternative Iceberg catalog 
implementations (e.g., Rest, Hive, Glue)
-
-Any jar placed in the `lib` directory will be automatically loaded by the 
Fluss coordinator server, making it available for Iceberg integration.
 :::
 
-3. Create a `docker-compose.yml` file with the following content:
+3. Create a `Dockerfile` for the custom Flink image:

Review Comment:
   It'll be hard for user to build flink image by themself. IIUC, the problem 
is that the flink image use user `flink` which cause the permission. I have 
solved by the following content:
   ```
   jobmanager:
       image: flink:1.20-scala_2.12-java17
       ports:
         - "8083:8081"
       entrypoint: ["/bin/bash", "-c"]
       command: >
         "sed -i 's/exec $$(drop_privs_cmd)//g' /docker-entrypoint.sh && 
          cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true; 
          /docker-entrypoint.sh jobmanager"
       environment:
         - |
           FLINK_PROPERTIES=
           jobmanager.rpc.address: jobmanager
       volumes:
         - shared-tmpfs:/tmp/iceberg
         - ./lib:/tmp/jars  # Mount the JARs directory
   
     taskmanager:
       image: flink:1.20-scala_2.12-java17
       depends_on:
         - jobmanager
       entrypoint: ["/bin/bash", "-c"]
       command: >
         "sed -i 's/exec $$(drop_privs_cmd)//g' /docker-entrypoint.sh && 
          cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true; 
          /docker-entrypoint.sh taskmanager"
       environment:
         - |
           FLINK_PROPERTIES=
           jobmanager.rpc.address: jobmanager
           taskmanager.numberOfTaskSlots: 10
           taskmanager.memory.process.size: 2048m
           taskmanager.memory.framework.off-heap.size: 256m
       volumes:
         - shared-tmpfs:/tmp/iceberg
         - ./lib:/tmp/jars  # Mount the JARs directory
   ```



##########
website/docs/quickstart/lakehouse.md:
##########


Review Comment:
   since we use standard flink image, we have no `tree` command in the 
`taskmanager`. I think we can remove this part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to