Copilot commented on code in PR #2576:
URL: https://github.com/apache/fluss/pull/2576#discussion_r2769872018


##########
website/docs/quickstart/lakehouse.md:
##########
@@ -677,6 +756,12 @@ If you wish to query only the data stored in 
Paimon—offering high-performance
 This approach also enables all the optimizations and features of a Flink 
Paimon table source, including [system 
table](https://paimon.apache.org/docs/1.3/concepts/system-tables/) such as 
`datalake_enriched_orders$lake$snapshots`.

Review Comment:
   Hardcoded Paimon version should use version placeholder. The URL contains a 
hardcoded version '1.3' instead of using the '$PAIMON_VERSION_SHORT$' 
placeholder. This is inconsistent with the changes made in the 
lakehouse-storage.md file at lines 38 and 69, where similar Paimon 
documentation URLs were updated to use the placeholder. The URL should be 
updated to 
'https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/concepts/system-tables/' 
to ensure version consistency across the documentation.



##########
website/docs/quickstart/lakehouse.md:
##########
@@ -312,23 +349,69 @@ Congratulations, you are all set!
 
 First, use the following command to enter the Flink SQL CLI Container:
 ```shell
-docker compose exec jobmanager ./sql-client
+docker compose exec jobmanager ./bin/sql-client.sh
 ```
 
-**Note**:
-To simplify this guide, three temporary tables have been pre-created with 
`faker` connector to generate data.
-You can view their schemas by running the following commands:
+To simplify this guide, we will create three temporary tables with `faker` 
connector to generate data:
 
 ```sql title="Flink SQL"
-SHOW CREATE TABLE source_customer;
+CREATE TEMPORARY TABLE source_order (
+    `order_key` BIGINT,
+    `cust_key` INT,
+    `total_price` DECIMAL(15, 2),
+    `order_date` DATE,
+    `order_priority` STRING,
+    `clerk` STRING
+) WITH (
+  'connector' = 'faker',
+  'rows-per-second' = '10',
+  'number-of-rows' = '10000',
+  'fields.order_key.expression' = '#{number.numberBetween 
''0'',''100000000''}',
+  'fields.cust_key.expression' = '#{number.numberBetween ''0'',''20''}',
+  'fields.total_price.expression' = '#{number.randomDouble 
''3'',''1'',''1000''}',
+  'fields.order_date.expression' = '#{date.past ''100'' ''DAYS''}',
+  'fields.order_priority.expression' = '#{regexify ''(low|medium|high){1}''}',
+  'fields.clerk.expression' = '#{regexify 
''(Clerk1|Clerk2|Clerk3|Clerk4){1}''}'
+);
+```
+
+```sql title="Flink SQL"
+CREATE TEMPORARY TABLE source_customer (
+    `cust_key` INT,
+    `name` STRING,
+    `phone` STRING,
+    `nation_key` INT NOT NULL,
+    `acctbal` DECIMAL(15, 2),
+    `mktsegment` STRING,
+    PRIMARY KEY (`cust_key`) NOT ENFORCED
+) WITH (
+  'connector' = 'faker',
+  'number-of-rows' = '200',
+  'fields.cust_key.expression' = '#{number.numberBetween ''0'',''20''}',
+  'fields.name.expression' = '#{funnyName.name}',
+  'fields.nation_key.expression' = '#{number.numberBetween ''1'',''5''}',
+  'fields.phone.expression' = '#{phoneNumber.cellPhone}',
+  'fields.acctbal.expression' = '#{number.randomDouble ''3'',''1'',''1000''}',
+  'fields.mktsegment.expression' = '#{regexify 
''(AUTOMOBILE|BUILDING|FURNITURE|MACHINERY|HOUSEHOLD){1}''}'
+);
 ```
 
 ```sql title="Flink SQL"
-SHOW CREATE TABLE source_order;
+CREATE TEMPORARY TABLE `source_nation` (
+  `nation_key` INT NOT NULL,
+  `name`       STRING,

Review Comment:
   Inconsistent whitespace: There's an extra space after `name` before the 
column name in the CREATE TABLE statement. The other columns don't have this 
extra whitespace. For consistency, the spacing should match the pattern used in 
other columns (single space after backtick).
   ```suggestion
     `name` STRING,
   ```



##########
website/docs/quickstart/lakehouse.md:
##########
@@ -72,37 +100,50 @@ services:
         datalake.paimon.warehouse: /tmp/paimon
     volumes:
       - shared-tmpfs:/tmp/paimon
+      - shared-tmpfs:/tmp/fluss
   zookeeper:
     restart: always
     image: zookeeper:3.9.2
-  #end
-  #begin Flink cluster
   jobmanager:
-    image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+    image: flink:1.20-scala_2.12-java17
     ports:
       - "8083:8081"
-    command: jobmanager
+    entrypoint: ["/bin/bash", "-c"]
+    command: >
+      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
+       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+       cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
+       /docker-entrypoint.sh jobmanager"
     environment:
       - |
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
     volumes:
       - shared-tmpfs:/tmp/paimon
+      - shared-tmpfs:/tmp/fluss
+      - ./lib:/tmp/jars
+      - ./opt:/tmp/opt
   taskmanager:
-    image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+    image: flink:1.20-scala_2.12-java17
     depends_on:
       - jobmanager
-    command: taskmanager
+    entrypoint: ["/bin/bash", "-c"]
+    command: >
+      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
+       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+       cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
+       /docker-entrypoint.sh taskmanager"

Review Comment:
   The `jobmanager` (and similarly the `taskmanager`) service explicitly 
disables Flink’s built-in privilege dropping by running `sed -i 's/exec 
$(drop_privs_cmd)//g' /docker-entrypoint.sh`, causing Flink processes to run as 
`root` inside the container instead of as an unprivileged user. If an attacker 
gains remote code execution in Flink (e.g., via a vulnerability in the 
JobManager/TaskManager or a malicious connector), that code would now execute 
with full root privileges in the container, increasing the impact and making it 
easier to tamper with files, mounted volumes, or further escalate. To reduce 
risk, avoid modifying the official entrypoint to remove `drop_privs_cmd` and 
run Flink under a non-root user as intended by the base image.



##########
website/docs/quickstart/lakehouse.md:
##########
@@ -726,32 +811,6 @@ The result looks like:
 ```
 You can execute the real-time analytics query multiple times, and the results 
will vary with each run as new data is continuously written to Fluss in 
real-time.
 
-Finally, you can use the following command to view the files stored in Paimon:
-```shell
-docker compose exec taskmanager tree /tmp/paimon/fluss.db
-```
-
-**Sample Output:**
-```shell
-/tmp/paimon/fluss.db
-└── datalake_enriched_orders
-    ├── bucket-0
-    │   ├── changelog-aef1810f-85b2-4eba-8eb8-9b136dec5bdb-0.orc
-    │   └── data-aef1810f-85b2-4eba-8eb8-9b136dec5bdb-1.orc
-    ├── manifest
-    │   ├── manifest-aaa007e1-81a2-40b3-ba1f-9df4528bc402-0
-    │   ├── manifest-aaa007e1-81a2-40b3-ba1f-9df4528bc402-1
-    │   ├── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-0
-    │   ├── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-1
-    │   └── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-2
-    ├── schema
-    │   └── schema-0
-    └── snapshot
-        ├── EARLIEST
-        ├── LATEST
-        └── snapshot-1
-```
-
 The files adhere to Paimon's standard format, enabling seamless querying with 
other engines such as 
[Spark](https://paimon.apache.org/docs/1.3/spark/quick-start/) and 
[Trino](https://paimon.apache.org/docs/1.3/ecosystem/trino/).

Review Comment:
   Hardcoded Paimon version should use version placeholder. The URLs contain 
hardcoded version '1.3' instead of using the '$PAIMON_VERSION_SHORT$' 
placeholder. This is inconsistent with the changes made in the 
lakehouse-storage.md file at lines 38 and 69, where similar Paimon 
documentation URLs were updated to use the placeholder. The URLs should be 
updated to use '$PAIMON_VERSION_SHORT$' to ensure version consistency across 
the documentation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to