wgtmac commented on code in PR #614:
URL: https://github.com/apache/iceberg-cpp/pull/614#discussion_r3179180618


##########
src/iceberg/catalog/rest/error_handlers.cc:
##########
@@ -183,4 +186,52 @@ Status ViewCommitErrorHandler::Accept(const ErrorResponse& 
error) const {
   return DefaultErrorHandler::Accept(error);
 }
 
+const std::shared_ptr<ScanPlanErrorHandler>& ScanPlanErrorHandler::Instance() {

Review Comment:
   What about renaming it to `PlanErrorHandler` to be consistent with the Java 
parity?



##########
src/iceberg/catalog/rest/resource_paths.h:
##########
@@ -81,6 +81,19 @@ class ICEBERG_REST_EXPORT ResourcePaths {
   /// \brief Get the /v1/{prefix}/transactions/commit endpoint path.
   Result<std::string> CommitTransaction() const;
 
+  /// \brief Get the /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan 
endpoint
+  /// path.
+  Result<std::string> ScanPlan(const TableIdentifier& ident) const;

Review Comment:
   Can we follow the same name of Java impl to use `Plan` and `FetchScanTasks`? 
This will make it easier to keep them in sync in the future.



##########
src/iceberg/catalog/rest/resource_paths.cc:
##########
@@ -102,4 +102,27 @@ Result<std::string> ResourcePaths::CommitTransaction() 
const {
   return std::format("{}/v1/{}transactions/commit", base_uri_, prefix_);
 }
 
+Result<std::string> ResourcePaths::ScanPlan(const TableIdentifier& ident) 
const {
+  ICEBERG_ASSIGN_OR_RAISE(std::string encoded_namespace, 
EncodeNamespace(ident.ns));

Review Comment:
   We need to respect `namespace_separator_` just like other functions.



##########
src/iceberg/test/rest_json_serde_test.cc:
##########
@@ -1380,4 +1381,139 @@ INSTANTIATE_TEST_SUITE_P(
       return info.param.test_name;
     });
 
+// Helper: empty schema and specs for scan response tests that don't need 
partition
+// parsing.
+static Schema EmptySchema() { return Schema({}, 0); }
+static std::unordered_map<int32_t, std::shared_ptr<PartitionSpec>> 
EmptySpecs() {
+  return {};
+}
+
+// --- PlanTableScanResponse ---
+
+TEST(PlanTableScanResponseFromJsonTest, SubmittedStatusMissingOptionalFields) {

Review Comment:
   Can we switch to paramterized test here and port as many cases as possible 
from the Jave tests like `TestFetchScanTasksResponseParser.java`, 
`TestPlanTableScanResponseParser.java` and 
`TestPlanTableScanRequestParser.java`?



##########
src/iceberg/catalog/rest/types.h:
##########
@@ -295,4 +298,73 @@ struct ICEBERG_REST_EXPORT OAuthTokenResponse {
   bool operator==(const OAuthTokenResponse&) const = default;
 };
 
+/// \brief Request to initiate a server-side scan planning operation.
+struct ICEBERG_REST_EXPORT PlanTableScanRequest {
+  std::optional<int64_t> snapshot_id;
+  std::vector<std::string> select;
+  std::shared_ptr<Expression> filter;
+  bool case_sensitive = true;
+  bool use_snapshot_schema = false;
+  std::optional<int64_t> start_snapshot_id;
+  std::optional<int64_t> end_snapshot_id;
+  std::vector<std::string> statsFields;
+  std::optional<int64_t> min_rows_required;
+
+  Status Validate() const;
+
+  bool operator==(const PlanTableScanRequest&) const;
+};
+
+/// \brief Base response containing scan tasks and delete files returned by 
scan plan
+/// endpoints.
+struct ICEBERG_REST_EXPORT BaseScanTaskResponse {

Review Comment:
   There is a trade-off in designing the `BaseScanTaskResponse` class hierachy. 
All other rest models defined in this file are trivial C++ structs that support 
aggregate initialization. That's why we didn't declare the `Validate()` 
function to be virtual. Using class hierachy like the Java equivalents disables 
aggregate initialization. That works perfect before adding 
`BaseScanTaskResponse`.
   
   If we want to keep the current `BaseScanTaskResponse` class hierachy, 
declaring `BaseScanTaskResponse::Validate` as non-virtual looks error-prone and 
downstreams have to figure out which specific subclass of BaseScanTaskResponse 
is.
   
   Instead, should we define it as something like below?
   
   ```cpp
   struct ICEBERG_REST_EXPORT ScanTaskResponse {
     // common fields
     std::vector<std::string> plan_tasks;
     ...
   
     struct PlanTableScanResponse {
       std::string plan_status;
       std::string plan_id;
     };
   
     struct FetchPlanningResultResponse {
       std::string plan_status;
     };
   
     ...
   
     std::variant<PlanTableScanResponse, FetchPlanningResultResponse, ...>
   
     Status Validate() const;
   }'
   ```



##########
src/iceberg/catalog/rest/error_handlers.cc:
##########
@@ -183,4 +186,52 @@ Status ViewCommitErrorHandler::Accept(const ErrorResponse& 
error) const {
   return DefaultErrorHandler::Accept(error);
 }
 
+const std::shared_ptr<ScanPlanErrorHandler>& ScanPlanErrorHandler::Instance() {
+  static const std::shared_ptr<ScanPlanErrorHandler> instance{new 
ScanPlanErrorHandler()};
+  return instance;
+}
+
+Status ScanPlanErrorHandler::Accept(const ErrorResponse& error) const {
+  switch (error.code) {
+    case 404:
+      if (error.type == kNoSuchNamespaceException) {
+        return NoSuchNamespace(error.message);
+      }
+      if (error.type == kNoSuchTableException) {
+        return NoSuchTable(error.message);
+      }
+      if (error.type == kNoSuchPlanIdException) {
+        return NoSuchPlanId(error.message);
+      }
+      if (error.type == kNoSuchPlanTaskException) {
+        return NoSuchPlanTask(error.message);
+      }
+      return NotFound(error.message);
+    case 406:
+      return NotSupported(error.message);

Review Comment:
   
   ```suggestion
   ```
   
   I don't see these lines from Java. Why do we need them?



##########
src/iceberg/catalog/rest/resource_paths.cc:
##########
@@ -102,4 +102,27 @@ Result<std::string> ResourcePaths::CommitTransaction() 
const {
   return std::format("{}/v1/{}transactions/commit", base_uri_, prefix_);
 }
 
+Result<std::string> ResourcePaths::ScanPlan(const TableIdentifier& ident) 
const {
+  ICEBERG_ASSIGN_OR_RAISE(std::string encoded_namespace, 
EncodeNamespace(ident.ns));
+  ICEBERG_ASSIGN_OR_RAISE(std::string encoded_table_name, 
EncodeString(ident.name));
+  return std::format("{}/v1/{}namespaces/{}/tables/{}/plan", base_uri_, 
prefix_,
+                     encoded_namespace, encoded_table_name);
+}
+
+Result<std::string> ResourcePaths::ScanPlan(const TableIdentifier& ident,
+                                            const std::string& plan_id) const {
+  ICEBERG_ASSIGN_OR_RAISE(std::string encoded_namespace, 
EncodeNamespace(ident.ns));
+  ICEBERG_ASSIGN_OR_RAISE(std::string encoded_table_name, 
EncodeString(ident.name));
+  ICEBERG_ASSIGN_OR_RAISE(std::string encoded_plan_id, EncodeString(plan_id));
+  return std::format("{}/v1/{}namespaces/{}/tables/{}/plan/{}", base_uri_, 
prefix_,
+                     encoded_namespace, encoded_table_name, encoded_plan_id);
+}
+
+Result<std::string> ResourcePaths::ScanTask(const TableIdentifier& ident) 
const {

Review Comment:
   Should we merge this with the above one by using `std::optional<int64_t> 
plan_id`?



##########
src/iceberg/catalog.h:
##########
@@ -26,8 +26,10 @@
 #include <unordered_set>
 #include <vector>
 
+#include "iceberg/catalog/rest/types.h"

Review Comment:
   This is a generic catalog interface so we need abstract away rest catalog 
specific concepts.



##########
src/iceberg/catalog/rest/types.h:
##########
@@ -295,4 +298,73 @@ struct ICEBERG_REST_EXPORT OAuthTokenResponse {
   bool operator==(const OAuthTokenResponse&) const = default;
 };
 
+/// \brief Request to initiate a server-side scan planning operation.
+struct ICEBERG_REST_EXPORT PlanTableScanRequest {
+  std::optional<int64_t> snapshot_id;
+  std::vector<std::string> select;
+  std::shared_ptr<Expression> filter;
+  bool case_sensitive = true;
+  bool use_snapshot_schema = false;
+  std::optional<int64_t> start_snapshot_id;
+  std::optional<int64_t> end_snapshot_id;
+  std::vector<std::string> statsFields;
+  std::optional<int64_t> min_rows_required;

Review Comment:
   
   ```suggestion
     std::optional<int64_t> min_rows_requested;
   ```



##########
src/iceberg/catalog.h:
##########
@@ -26,8 +26,10 @@
 #include <unordered_set>
 #include <vector>
 
+#include "iceberg/catalog/rest/types.h"
 #include "iceberg/result.h"
 #include "iceberg/table_identifier.h"
+#include "iceberg/table_scan.h"

Review Comment:
   Let's use forward declaration as much as possible



##########
src/iceberg/catalog/rest/types.h:
##########
@@ -295,4 +298,73 @@ struct ICEBERG_REST_EXPORT OAuthTokenResponse {
   bool operator==(const OAuthTokenResponse&) const = default;
 };
 
+/// \brief Request to initiate a server-side scan planning operation.
+struct ICEBERG_REST_EXPORT PlanTableScanRequest {
+  std::optional<int64_t> snapshot_id;
+  std::vector<std::string> select;
+  std::shared_ptr<Expression> filter;
+  bool case_sensitive = true;
+  bool use_snapshot_schema = false;
+  std::optional<int64_t> start_snapshot_id;
+  std::optional<int64_t> end_snapshot_id;
+  std::vector<std::string> statsFields;
+  std::optional<int64_t> min_rows_required;
+
+  Status Validate() const;
+
+  bool operator==(const PlanTableScanRequest&) const;
+};
+
+/// \brief Base response containing scan tasks and delete files returned by 
scan plan
+/// endpoints.
+struct ICEBERG_REST_EXPORT BaseScanTaskResponse {
+  std::vector<std::string> plan_tasks;
+  std::vector<FileScanTask> file_scan_tasks;

Review Comment:
   It seems that using `std::shared_ptr<FileScanTask>` and 
`std::shared_ptr<DataFile>` is more flexible and consistent with the repo.



##########
src/iceberg/catalog/rest/types.h:
##########
@@ -295,4 +298,73 @@ struct ICEBERG_REST_EXPORT OAuthTokenResponse {
   bool operator==(const OAuthTokenResponse&) const = default;
 };
 
+/// \brief Request to initiate a server-side scan planning operation.
+struct ICEBERG_REST_EXPORT PlanTableScanRequest {
+  std::optional<int64_t> snapshot_id;
+  std::vector<std::string> select;
+  std::shared_ptr<Expression> filter;
+  bool case_sensitive = true;
+  bool use_snapshot_schema = false;
+  std::optional<int64_t> start_snapshot_id;
+  std::optional<int64_t> end_snapshot_id;
+  std::vector<std::string> statsFields;

Review Comment:
   
   ```suggestion
     std::vector<std::string> stats_fields;
   ```



##########
src/iceberg/catalog/rest/json_serde_internal.h:
##########
@@ -62,4 +66,26 @@ ICEBERG_DECLARE_JSON_SERDE(OAuthTokenResponse)
 
 #undef ICEBERG_DECLARE_JSON_SERDE
 
+ICEBERG_REST_EXPORT Result<PlanTableScanResponse> 
PlanTableScanResponseFromJson(

Review Comment:
   Same as my other comments, it would be good to add both `ToJson` and 
`FromJson` for these new request and response types just like others.



##########
src/iceberg/json_serde_internal.h:
##########
@@ -413,4 +417,39 @@ ICEBERG_EXPORT nlohmann::json ToJson(const 
TableRequirement& requirement);
 ICEBERG_EXPORT Result<std::unique_ptr<TableRequirement>> 
TableRequirementFromJson(
     const nlohmann::json& json);
 
+/// \brief Deserializes a JSON object into a `DataFile` object.
+///
+/// Parses a DataFile from the REST Catalog JSON format. Maps (column-sizes,
+/// value-counts, etc.) use the CountMap/BinaryMap parallel arrays format.
+/// Binary fields (lower-bounds, upper-bounds, key-metadata) are 
base64-encoded.
+///
+/// \param json The JSON object representing a `DataFile`.
+/// \param partitionSpecById Map from spec ID to PartitionSpec for type-aware 
partition
+/// parsing.
+/// \param schema The table schema, used with partitionSpecById to resolve 
partition
+/// types.
+/// \return A `DataFile` object or an error if the conversion fails.
+ICEBERG_EXPORT Result<DataFile> DataFileFromJson(const nlohmann::json& json);

Review Comment:
   I would highly recommend adding `ToJson` for `DataFile` and `FileScanTask` 
so we can have roundtrip test coverage. In addition, although this library 
currently does not need to implement rest catalog server, it should be a 
building block in case downstream users need to do that.



##########
src/iceberg/catalog.h:
##########
@@ -188,6 +190,44 @@ class ICEBERG_EXPORT Catalog {
   /// \return a Table instance or ErrorKind::kAlreadyExists if the table 
already exists
   virtual Result<std::shared_ptr<Table>> RegisterTable(
       const TableIdentifier& identifier, const std::string& 
metadata_file_location) = 0;
+
+  /// \brief Initiate a scan planning operation for the given table.
+  ///
+  /// \param table The table to scan.
+  /// \param context The scan context containing snapshot, filter, and other 
options.
+  /// \return A PlanTableScanResponse with the plan status and initial scan 
tasks.
+  virtual Result<rest::PlanTableScanResponse> PlanTableScan(

Review Comment:
   Same as above, we should use better abstraction. Server side scan planning 
needs to be handled inside the table object returned by `LoadTable` above. 
Perhaps we need to introduce the same `RestTable` class hierachy just like the 
Java impl.



##########
src/iceberg/catalog.h:
##########
@@ -188,6 +190,44 @@ class ICEBERG_EXPORT Catalog {
   /// \return a Table instance or ErrorKind::kAlreadyExists if the table 
already exists
   virtual Result<std::shared_ptr<Table>> RegisterTable(
       const TableIdentifier& identifier, const std::string& 
metadata_file_location) = 0;
+
+  /// \brief Initiate a scan planning operation for the given table.
+  ///
+  /// \param table The table to scan.
+  /// \param context The scan context containing snapshot, filter, and other 
options.
+  /// \return A PlanTableScanResponse with the plan status and initial scan 
tasks.
+  virtual Result<rest::PlanTableScanResponse> PlanTableScan(

Review Comment:
   It this requires larger changes, we can split them into smaller PRs. This PR 
can just focus on adding rest models with ser/de support and error handlers.



##########
src/iceberg/json_serde_internal.h:
##########
@@ -413,4 +417,39 @@ ICEBERG_EXPORT nlohmann::json ToJson(const 
TableRequirement& requirement);
 ICEBERG_EXPORT Result<std::unique_ptr<TableRequirement>> 
TableRequirementFromJson(
     const nlohmann::json& json);
 
+/// \brief Deserializes a JSON object into a `DataFile` object.
+///
+/// Parses a DataFile from the REST Catalog JSON format. Maps (column-sizes,
+/// value-counts, etc.) use the CountMap/BinaryMap parallel arrays format.
+/// Binary fields (lower-bounds, upper-bounds, key-metadata) are 
base64-encoded.
+///
+/// \param json The JSON object representing a `DataFile`.
+/// \param partitionSpecById Map from spec ID to PartitionSpec for type-aware 
partition
+/// parsing.
+/// \param schema The table schema, used with partitionSpecById to resolve 
partition
+/// types.
+/// \return A `DataFile` object or an error if the conversion fails.
+ICEBERG_EXPORT Result<DataFile> DataFileFromJson(const nlohmann::json& json);
+
+ICEBERG_EXPORT Result<DataFile> DataFileFromJson(

Review Comment:
   BTW, let's revert changes in this file. These json serde impls on DataFile 
and FileScanTask are specific to the rest catalog spec. 
`src/iceberg/catalog/rest/json_serde.cc` is the home. Java 
`TableScanResponseParser.java` is doing the same thing.



##########
src/iceberg/json_serde_internal.h:
##########
@@ -413,4 +417,39 @@ ICEBERG_EXPORT nlohmann::json ToJson(const 
TableRequirement& requirement);
 ICEBERG_EXPORT Result<std::unique_ptr<TableRequirement>> 
TableRequirementFromJson(
     const nlohmann::json& json);
 
+/// \brief Deserializes a JSON object into a `DataFile` object.
+///
+/// Parses a DataFile from the REST Catalog JSON format. Maps (column-sizes,
+/// value-counts, etc.) use the CountMap/BinaryMap parallel arrays format.
+/// Binary fields (lower-bounds, upper-bounds, key-metadata) are 
base64-encoded.
+///
+/// \param json The JSON object representing a `DataFile`.
+/// \param partitionSpecById Map from spec ID to PartitionSpec for type-aware 
partition
+/// parsing.
+/// \param schema The table schema, used with partitionSpecById to resolve 
partition
+/// types.
+/// \return A `DataFile` object or an error if the conversion fails.
+ICEBERG_EXPORT Result<DataFile> DataFileFromJson(const nlohmann::json& json);
+
+ICEBERG_EXPORT Result<DataFile> DataFileFromJson(
+    const nlohmann::json& json,
+    const std::unordered_map<int32_t, std::shared_ptr<PartitionSpec>>& 
partitionSpecById,

Review Comment:
   
   ```suggestion
       const std::unordered_map<int32_t, std::shared_ptr<PartitionSpec>>& 
partition_spec_by_id,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to