yihua commented on code in PR #18867:
URL: https://github.com/apache/hudi/pull/18867#discussion_r3315484968
##########
website/docs/lance_file_format.md:
##########
@@ -39,18 +68,21 @@ TBLPROPERTIES (
.option("hoodie.datasource.write.recordkey.field", "id")
.option("hoodie.record.merger.impls",
"org.apache.hudi.DefaultSparkRecordMerger")
- .option("hoodie.datasource.write.base.file.format", "lance")
+ .option("hoodie.table.base.file.format", "lance")
.mode("overwrite")
.save("/path/to/my_ai_table"))
```
### Required Dependencies
-Add the Lance Spark bundle to your Spark classpath:
+The Lance JAR is not bundled in Hudi. Add the appropriate Lance Spark bundle
to your Spark classpath:
-| Component | Maven Coordinates |
-|:----------|:-----------------|
-| Lance Spark Bundle (Spark 3.5) |
`org.lance:lance-spark-bundle-3.5_2.12:0.4.0` |
+| Spark Version | Maven Coordinates |
+|:--------------|:-----------------|
+| Spark 3.4 | `org.lance:lance-spark-3.4_2.12:0.4.0` |
+| Spark 3.5 | `org.lance:lance-spark-3.5_2.12:0.4.0` |
+| Spark 4.0 | `org.lance:lance-spark-4.0_2.13:0.4.0` |
Review Comment:
Good catch — fixed in d78a01b8cc35.
Both artifacts are actually published on Maven Central under the same Lance
version — they serve different purposes:
- `org.lance:lance-spark-<spark>_<scala>:0.4.0` — the bare connector
(declared as a Maven dependency in Hudi's `pom.xml`).
- `org.lance:lance-spark-bundle-<spark>_<scala>:0.4.0` — the shaded uber-JAR
with transitive deps, intended for `spark-shell` / `spark-submit --jars`.
Since the section is about adding the JAR to a Spark classpath, I aligned
the table to the `-bundle-` artifacts (matching the shell example) and added a
paragraph explaining the connector-vs-bundle distinction so users coming from
Hudi's pom understand why both names appear in the wild. Verified the naming
against Hudi's own pom.xml (`lance.spark.artifact =
lance-spark-<spark>_<scala>`) and the
`hudi-examples/.../vector_blob_demo/README.md` which explicitly documents both
names.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]