Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


ZENOTME commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1579401733


##
crates/e2e_test/testdata/docker-compose.yaml:
##
@@ -0,0 +1,59 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+version: '3.8'
+
+services:
+  rest:
+image: tabulario/iceberg-rest:0.10.0
+environment:
+  - AWS_ACCESS_KEY_ID=admin
+  - AWS_SECRET_ACCESS_KEY=password
+  - AWS_REGION=us-east-1
+  - CATALOG_CATOLOG__IMPL=org.apache.iceberg.jdbc.JdbcCatalog
+  - CATALOG_URI=jdbc:sqlite:file:/tmp/iceberg_rest_mode=memory
+  - CATALOG_WAREHOUSE=s3://icebergdata/demo
+  - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
+  - CATALOG_S3_ENDPOINT=http://minio:9000
+depends_on:
+  - minio
+links:
+  - minio:icebergdata.minio
+expose:
+  - 8181
+
+  minio:

Review Comment:
   how about change `minio/minio:RELEASE.2024-03-07T00-43-48Z`, 
`minio/mc:RELEASE.2024-03-07T00-31-49Z` to `minio/minio:latest`, 
`minio/mc:latest`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1579401183


##
crates/e2e_test/tests/append_data_file_test.rs:
##
@@ -0,0 +1,212 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Integration tests for rest catalog.
+
+use arrow_array::{ArrayRef, BooleanArray, Int32Array, RecordBatch, 
StringArray};
+use futures::TryStreamExt;
+use iceberg::io::{S3_ACCESS_KEY_ID, S3_ENDPOINT, S3_REGION, 
S3_SECRET_ACCESS_KEY};
+use iceberg::spec::{NestedField, PrimitiveType, Schema, Type};
+use iceberg::transaction::Transaction;
+use iceberg::writer::base_writer::data_file_writer::{DataFileWriterBuilder, 
DataFileWriterConfig};
+use iceberg::writer::file_writer::location_generator::{
+DefaultFileNameGenerator, DefaultLocationGenerator,
+};
+use iceberg::writer::file_writer::ParquetWriterBuilder;
+use iceberg::writer::{IcebergWriter, IcebergWriterBuilder};
+use iceberg::{Catalog, Namespace, NamespaceIdent, TableCreation};
+use iceberg_catalog_rest::{RestCatalog, RestCatalogConfig};
+use iceberg_test_utils::docker::DockerCompose;
+use iceberg_test_utils::{normalize_test_name, set_up};
+use parquet::file::properties::WriterProperties;
+use port_scanner::scan_port_addr;
+use std::collections::HashMap;
+use std::sync::Arc;
+use tokio::time::sleep;
+
+const REST_CATALOG_PORT: u16 = 8181;
+
+struct TestFixture {
+_docker_compose: DockerCompose,
+rest_catalog: RestCatalog,
+}
+
+async fn set_test_fixture(func: &str) -> TestFixture {
+set_up();
+let docker_compose = DockerCompose::new(
+normalize_test_name(format!("{}_{func}", module_path!())),
+format!("{}/testdata", env!("CARGO_MANIFEST_DIR")),
+);
+
+// Start docker compose
+docker_compose.run();
+
+let rest_catalog_ip = docker_compose.get_container_ip("rest");
+
+let read_port = format!("{}:{}", rest_catalog_ip, REST_CATALOG_PORT);
+loop {
+if !scan_port_addr(&read_port) {
+log::info!("Waiting for 1s rest catalog to ready...");
+sleep(std::time::Duration::from_millis(1000)).await;
+} else {
+break;
+}
+}
+
+let container_ip = docker_compose.get_container_ip("minio");

Review Comment:
   I've exposed the ports, and this allows me to just point to 127.0.0.1 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] How to move Iceberg table from one location to another [iceberg]

2024-04-25 Thread via GitHub


cccs-jc commented on issue #3142:
URL: https://github.com/apache/iceberg/issues/3142#issuecomment-2077081125

   @ksmatharoo, thank you for providing your code to replace the paths within 
the iceberg metadata.
   
   I've implemented a similar solution using Python's fastavro to handle Avro 
files. In my approach, I read the Avro files, replace the paths, and then save 
the modified files.
   
   So far, everything seems to be working well. However, I noticed that your 
code is more sophisticated, particularly in how you manage lists of known 
paths. I'm curious if there's a specific reason for this. While my solution is 
straightforward, I'm wondering if there might be subtle issues that I haven't 
considered. Any insights you could provide would be appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1579375646


##
crates/e2e_test/testdata/docker-compose.yaml:
##
@@ -0,0 +1,59 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+version: '3.8'
+
+services:
+  rest:
+image: tabulario/iceberg-rest:0.10.0
+environment:
+  - AWS_ACCESS_KEY_ID=admin
+  - AWS_SECRET_ACCESS_KEY=password
+  - AWS_REGION=us-east-1
+  - CATALOG_CATOLOG__IMPL=org.apache.iceberg.jdbc.JdbcCatalog
+  - CATALOG_URI=jdbc:sqlite:file:/tmp/iceberg_rest_mode=memory
+  - CATALOG_WAREHOUSE=s3://icebergdata/demo
+  - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
+  - CATALOG_S3_ENDPOINT=http://minio:9000
+depends_on:
+  - minio
+links:
+  - minio:icebergdata.minio
+expose:
+  - 8181

Review Comment:
   I would love to run this from my IDE as well:
   ```suggestion
   ports:
 - 8181:8181
   expose:
 - 8181
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1579372414


##
crates/e2e_test/testdata/docker-compose.yaml:
##
@@ -0,0 +1,59 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+version: '3.8'
+
+services:
+  rest:
+image: tabulario/iceberg-rest:0.10.0
+environment:
+  - AWS_ACCESS_KEY_ID=admin
+  - AWS_SECRET_ACCESS_KEY=password
+  - AWS_REGION=us-east-1
+  - CATALOG_CATOLOG__IMPL=org.apache.iceberg.jdbc.JdbcCatalog
+  - CATALOG_URI=jdbc:sqlite:file:/tmp/iceberg_rest_mode=memory
+  - CATALOG_WAREHOUSE=s3://icebergdata/demo
+  - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
+  - CATALOG_S3_ENDPOINT=http://minio:9000
+depends_on:
+  - minio
+links:
+  - minio:icebergdata.minio
+expose:
+  - 8181
+
+  minio:
+image: minio/minio:RELEASE.2024-03-07T00-43-48Z
+environment:
+  - MINIO_ROOT_USER=admin
+  - MINIO_ROOT_PASSWORD=password
+  - MINIO_DOMAIN=minio
+expose:
+  - 9001
+  - 9000

Review Comment:
   How about also exposing the management console:
   ```suggestion
   ports:
 - 9001:9001
   expose:
 - 9001
 - 9000
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1579369755


##
crates/e2e_test/testdata/docker-compose.yaml:
##
@@ -0,0 +1,59 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+version: '3.8'
+
+services:
+  rest:
+image: tabulario/iceberg-rest:0.10.0
+environment:
+  - AWS_ACCESS_KEY_ID=admin
+  - AWS_SECRET_ACCESS_KEY=password
+  - AWS_REGION=us-east-1
+  - CATALOG_CATOLOG__IMPL=org.apache.iceberg.jdbc.JdbcCatalog
+  - CATALOG_URI=jdbc:sqlite:file:/tmp/iceberg_rest_mode=memory
+  - CATALOG_WAREHOUSE=s3://icebergdata/demo
+  - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
+  - CATALOG_S3_ENDPOINT=http://minio:9000
+depends_on:
+  - minio
+links:
+  - minio:icebergdata.minio
+expose:
+  - 8181
+
+  minio:

Review Comment:
   It doesn't boot on my end:
   
   ```
   ➜  iceberg-rust git:(tx_append) docker logs -f d7a12d1f9d30
   Formatting 1st pool, 1 set(s), 1 drives per set.
   WARNING: Host local has more than 0 drives of set. A host failure will 
result in data becoming unavailable.
   MinIO Object Storage Server
   Copyright: 2015-2024 MinIO, Inc.
   License: GNU AGPLv3 
   Version: RELEASE.2024-03-07T00-43-48Z (go1.21.8 linux/arm64)
   
   Status: 1 Online, 0 Offline. 
   S3-API: http://172.20.0.2:9000  http://127.0.0.1:9000 
   Console: http://172.20.0.2:9001 http://127.0.0.1:9001   
   
   Documentation: https://min.io/docs/minio/linux/index.html
   Warning: The standard parity is set to 0. This can lead to data loss.
   
   API: ListObjectsV2(bucket=icebergdata)
   Time: 12:11:34 UTC 04/25/2024
   DeploymentID: 0d2c88aa-2393-4c17-a28c-560e6cfe4b9b
   RequestID: 17C984C03C8E1AFA
   RemoteHost: 172.20.0.3
   Host: minio:9000
   UserAgent: MinIO (linux; arm64) minio-go/v7.0.67 
mc/RELEASE.2024-03-07T00-31-49Z
   Error: volume not found (cmd.StorageErr)
  7: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
  6: internal/logger/logonce.go:149:logger.LogOnceIf()
  5: 
cmd/erasure-server-pool.go:1516:cmd.(*erasureServerPools).ListObjects()
  4: 
cmd/erasure-server-pool.go:1275:cmd.(*erasureServerPools).ListObjectsV2()
  3: 
cmd/bucket-listobjects-handlers.go:210:cmd.objectAPIHandlers.listObjectsV2Handler()
  2: 
cmd/bucket-listobjects-handlers.go:156:cmd.objectAPIHandlers.ListObjectsV2Handler()
  1: net/http/server.go:2136:http.HandlerFunc.ServeHTTP()
   
You are running an older version of MinIO released 1 month before the 
latest release 
Update: Run `mc admin update ALIAS` 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1579368639


##
crates/e2e_test/testdata/docker-compose.yaml:
##
@@ -0,0 +1,59 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+version: '3.8'
+
+services:
+  rest:
+image: tabulario/iceberg-rest:0.10.0
+environment:
+  - AWS_ACCESS_KEY_ID=admin
+  - AWS_SECRET_ACCESS_KEY=password
+  - AWS_REGION=us-east-1
+  - CATALOG_CATOLOG__IMPL=org.apache.iceberg.jdbc.JdbcCatalog

Review Comment:
   There is a typo in this. I think this line can go.
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2024-04-25 Thread via GitHub


nastra commented on PR #9008:
URL: https://github.com/apache/iceberg/pull/9008#issuecomment-2077042155

   @epgif can you please address the test failures?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-25 Thread via GitHub


nastra commented on code in PR #9852:
URL: https://github.com/apache/iceberg/pull/9852#discussion_r1579326259


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##
@@ -209,11 +262,63 @@ public boolean dropTable(TableIdentifier identifier, 
boolean purge) {
 }
   }
 
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+if (!isValidIdentifier(identifier)) {
+  return false;
+}
+
+try {
+  String database = identifier.namespace().level(0);
+  String viewName = identifier.name();
+  Table table = clients.run(client -> client.getTable(database, viewName));
+  HiveOperationsBase.validateTableIsIcebergView(

Review Comment:
   @nk1506 you should be fine removing 
   ```
   Table table = clients.run(client -> client.getTable(database, viewName));
 HiveOperationsBase.validateTableIsIcebergView(...)
   ```
   
   as that's what I meant in my comment above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-25 Thread via GitHub


nastra commented on code in PR #9852:
URL: https://github.com/apache/iceberg/pull/9852#discussion_r1579323906


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##
@@ -209,11 +262,63 @@ public boolean dropTable(TableIdentifier identifier, 
boolean purge) {
 }
   }
 
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+if (!isValidIdentifier(identifier)) {
+  return false;
+}
+
+try {
+  String database = identifier.namespace().level(0);
+  String viewName = identifier.name();
+  Table table = clients.run(client -> client.getTable(database, viewName));
+  HiveOperationsBase.validateTableIsIcebergView(
+  table, CatalogUtil.fullTableName(name, identifier));
+
+  HiveViewOperations ops = (HiveViewOperations) newViewOps(identifier);
+  ViewMetadata lastViewMetadata = null;
+
+  try {
+lastViewMetadata = ops.current();
+  } catch (NotFoundException e) {
+LOG.warn("Failed to load view metadata for view: {}", identifier, e);
+  }
+
+  clients.run(
+  client -> {
+client.dropTable(database, viewName);
+return null;
+  });
+
+  if (lastViewMetadata != null) {

Review Comment:
   But I'm curious to hear about other people's opinions on whether view 
metadata should be deleted immediately when the view is dropped



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] AWS: Fix TestGlueCatalogTable#testCreateTable [iceberg]

2024-04-25 Thread via GitHub


nastra merged PR #10221:
URL: https://github.com/apache/iceberg/pull/10221


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Fix bounded source state restore record duplication [iceberg]

2024-04-25 Thread via GitHub


pvary commented on PR #10208:
URL: https://github.com/apache/iceberg/pull/10208#issuecomment-2076940629

   Are we good to go with this @stevenzwu ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-25 Thread via GitHub


nastra commented on code in PR #9852:
URL: https://github.com/apache/iceberg/pull/9852#discussion_r1579292821


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##
@@ -209,11 +262,58 @@ public boolean dropTable(TableIdentifier identifier, 
boolean purge) {
 }
   }
 
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+if (!isValidIdentifier(identifier)) {
+  return false;
+}
+
+try {
+  String database = identifier.namespace().level(0);
+  String viewName = identifier.name();
+
+  HiveViewOperations ops = (HiveViewOperations) newViewOps(identifier);
+  ViewMetadata lastViewMetadata = null;
+  try {
+lastViewMetadata = ops.current();
+  } catch (NotFoundException e) {
+LOG.warn("Failed to load view metadata for view: {}", identifier, e);
+  }
+
+  clients.run(
+  client -> {
+client.dropTable(database, viewName, false, false);
+return null;
+  });
+
+  if (lastViewMetadata != null) {
+CatalogUtil.dropViewMetadata(ops.io(), lastViewMetadata);
+LOG.info("Dropped view: {}", identifier);
+return true;

Review Comment:
   I don't think this is correct to return `true` here. It should be in L295 
(as you had it before) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-25 Thread via GitHub


pvary commented on PR #10200:
URL: https://github.com/apache/iceberg/pull/10200#issuecomment-2076931744

   > After taking a closer look at `BaseTaskWriter`, I think we may have a 
correctness issue when encoding changes if the table contains multiple specs. 
Our current implementation of `BaseTaskWriter` assumes all writes are encoded 
against the current spec. However, what if there are some matching keys in 
other specs? Written deletes will be scoped to the current partition spec and 
will not apply to data in other specs, potentially missing to upsert some 
records. I think we would want to eventually migrate Flink to new writers that 
inherit `PartitioningWriter`. This is out of scope of this PR, however.
   
   I agree, that we need to fix this. Also agree, that in a different PR.
   
   > I am also not sure we need `ContinuousFileScopedPositionDeleteWriter`. I 
understand we want to solve the companion issue before Flink migrates to 
`PartitioningWriter` so we have to come up with a fix for `TaskWriter`. What 
about directly using `SortingPositionOnlyDeleteWriter` with file granularity in 
`BaseEqualityDeltaWriter`? We only need to pass a closure to create new 
position delete writers and that class should already sort deletes for us on 
the fly. We never needed to persist deleted rows in position deletes so no 
behaviour change.
   
   Updated to use the `SortingPositionOnlyDeleteWriter`. Needed a small "fix" 
to prevent writing empty delete file, but I would like to ask you to double 
check this.
   
   Also there is a behavioural change that the previous write rolled deletes on 
`DEFAULT_RECORDS_NUM_THRESHOLD`=100_000L threshold (no config, just a hard 
coded value). We can't do this with the new writer, but I think it is highly 
unlikely that anyone reached this in a production scenario. If we need this 
feature back, either we need to use the `SortedPosDeleteWriter`, or we can 
provide similar feature by `RollingFileWriter`. The issue is that 
`RollingFileWriter` uses a size threshold instead of a RECORDS_NUM threshold, 
and we need to rewrite/enhance the `RollingFileWriter` to use a closure to 
create a writer instead of the factory (as it does today).
   
   I see 3 options:
   1. Accept that there is no 100_000L threshold for the positional delete file 
size anymore
   2. Use the old SortedPosDeleteWriter for PARTITION granularity. The FILE 
granularity is a new feature, and most probably will not create big files (as 
the data files tend to be bigger, than the delete files, and when a data file 
is rolled, the delete file rolls automatically) - so the behaviour change will 
affect only very-edge cases 
   3. Invest in implementing the RollingFileWriter changes
   
   I would vote for the 1st option, and would accept the 2nd if that is the 
consensus. I do not think that the 3rd worth the investment.
   
   What do you think @stevenzwu, @aokolnychyi?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-25 Thread via GitHub


pvary commented on code in PR #10200:
URL: https://github.com/apache/iceberg/pull/10200#discussion_r1579259488


##
core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java:
##
@@ -174,20 +208,21 @@ private void flushDeletes() {
 } catch (IOException e) {
   setFailure(e);
   throw new UncheckedIOException(
-  "Failed to write the sorted path/pos pairs to pos-delete file: "
-  + outputFile.encryptingOutputFile().location(),

Review Comment:
   reverted



##
core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java:
##
@@ -59,20 +60,58 @@ class SortedPosDeleteWriter implements 
FileWriter, DeleteWr
   OutputFileFactory fileFactory,
   FileFormat format,
   StructLike partition,
-  long recordsNumThreshold) {
+  long recordsNumThreshold,
+  DeleteGranularity deleteGranularity) {
 this.appenderFactory = appenderFactory;
 this.fileFactory = fileFactory;
 this.format = format;
 this.partition = partition;
 this.recordsNumThreshold = recordsNumThreshold;
+this.deleteGranularity = deleteGranularity;
+  }
+
+  SortedPosDeleteWriter(

Review Comment:
   reverted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Test suite for manifest filtering [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 commented on issue #351:
URL: https://github.com/apache/iceberg-rust/issues/351#issuecomment-2076879150

   Assigned, thanks @s-akhtar-baig !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-25 Thread via GitHub


pvary commented on code in PR #10200:
URL: https://github.com/apache/iceberg/pull/10200#discussion_r1579255478


##
core/src/main/java/org/apache/iceberg/deletes/SortingPositionOnlyDeleteWriter.java:
##
@@ -118,6 +118,10 @@ private DeleteWriteResult writeFileDeletes() throws 
IOException {
 
   @SuppressWarnings("CollectionUndefinedEquality")
   private DeleteWriteResult writeDeletes(Collection paths) 
throws IOException {
+if (paths.size() == 0) {

Review Comment:
   Needed to add this, so we do not create an empty delete file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-25 Thread via GitHub


nk1506 commented on code in PR #9852:
URL: https://github.com/apache/iceberg/pull/9852#discussion_r1579255109


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##
@@ -209,11 +262,63 @@ public boolean dropTable(TableIdentifier identifier, 
boolean purge) {
 }
   }
 
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+if (!isValidIdentifier(identifier)) {
+  return false;
+}
+
+try {
+  String database = identifier.namespace().level(0);
+  String viewName = identifier.name();
+  Table table = clients.run(client -> client.getTable(database, viewName));
+  HiveOperationsBase.validateTableIsIcebergView(

Review Comment:
   It seems only verifying is not required . There must be a way to tell if 
dropView operation is `FALSE`. 
   Currently `HiveViewOperations` doesn't throw error during 
[refresh](https://github.com/apache/iceberg/pull/9852/files#diff-db46657b84d66e084e15f31b8dab21577efb2ae7102863f94c6c9477782de676R102)
 .  
   with ` lastViewMetadata = ops.current();` `lastViewMetadata` will be simply 
null. If we want to return FALSE some way we should validate. 
   
   Any thoughts ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Implement the equality delete writer [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 commented on issue #341:
URL: https://github.com/apache/iceberg-rust/issues/341#issuecomment-2076878340

   Assigned to you, thanks @Dysprosium0626 !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Implement all functions of BoundPredicateVisitor for ManifestFilterVisitor [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 commented on issue #350:
URL: https://github.com/apache/iceberg-rust/issues/350#issuecomment-2076877306

   @s-akhtar-baig Assigned, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1579248989


##
crates/iceberg/src/scan.rs:
##
@@ -158,8 +196,24 @@ impl TableScan {
 .await?;
 
 // Generate data file stream
-let mut entries = iter(manifest_list.entries());
-while let Some(entry) = entries.next().await {
+for entry in manifest_list.entries() {
+// If this scan has a filter, check the partition evaluator 
cache for an existing
+// PartitionEvaluator that matches this manifest's partition 
spec ID.
+// Use one from the cache if there is one. If not, create one, 
put it in
+// the cache, and take a reference to it.
+#[allow(clippy::map_entry)]
+if let Some(filter) = filter.as_ref() {
+if 
!manifest_evaluator_cache.contains_key(&entry.partition_spec_id) {

Review Comment:
   TBH I prefer the entry api which make code more concise, but this is not a 
blocker.



##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +240,25 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_evaluator(
+id: i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> crate::Result {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();

Review Comment:
   ```suggestion
   let bound_predicate = filter.bind(schema.clone(), case_sensitive)?;
   ```
   
   We should not panic here.



##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +240,25 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_evaluator(
+id: i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> crate::Result {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();
+
+let partition_spec = table_metadata.partition_spec_by_id(id).unwrap();

Review Comment:
   ```suggestion
   let partition_spec = table_metadata.partition_spec_by_id(id)?;
   ```
   
   Why panic here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-25 Thread via GitHub


nk1506 commented on code in PR #9852:
URL: https://github.com/apache/iceberg/pull/9852#discussion_r1579215950


##
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveViewCommits.java:
##
@@ -0,0 +1,437 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+import static org.mockito.Mockito.any;
+import static org.mockito.Mockito.anyBoolean;
+import static org.mockito.Mockito.doAnswer;
+import static org.mockito.Mockito.doReturn;
+import static org.mockito.Mockito.doThrow;
+import static org.mockito.Mockito.spy;
+import static org.mockito.Mockito.when;
+
+import java.io.File;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import org.apache.iceberg.CatalogProperties;
+import org.apache.iceberg.CatalogUtil;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.CommitStateUnknownException;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.BaseView;
+import org.apache.iceberg.view.View;
+import org.apache.iceberg.view.ViewMetadata;
+import org.apache.thrift.TException;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.RegisterExtension;
+
+/** Test Hive locks and Hive errors and retry during commits. */
+public class TestHiveViewCommits {
+
+  private static final String VIEW_NAME = "test_iceberg_view";
+  private static final String DB_NAME = "hivedb";
+  private static final Namespace ns = Namespace.of(DB_NAME);
+  private static final Schema SCHEMA =
+  new Schema(
+  5,
+  required(3, "id", Types.IntegerType.get(), "unique ID"),
+  required(4, "data", Types.StringType.get()));
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(ns, VIEW_NAME);
+
+  @RegisterExtension
+  protected static final HiveMetastoreExtension HIVE_METASTORE_EXTENSION =
+  HiveMetastoreExtension.builder().withDatabase(DB_NAME).build();
+
+  private View view;
+
+  private static HiveCatalog catalog;
+
+  @BeforeAll
+  public static void initCatalog() {
+catalog =
+(HiveCatalog)
+CatalogUtil.loadCatalog(
+HiveCatalog.class.getName(),
+CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE,
+ImmutableMap.of(
+CatalogProperties.CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS,
+String.valueOf(TimeUnit.SECONDS.toMillis(10))),
+HIVE_METASTORE_EXTENSION.hiveConf());
+  }
+
+  @BeforeEach
+  public void createTestView() {
+view =
+catalog
+.buildView(VIEW_IDENTIFIER)
+.withSchema(SCHEMA)
+.withDefaultNamespace(ns)
+.withQuery("hive", "select * from ns.tbl")
+.create();
+  }
+
+  @AfterEach
+  public void dropTestView() {
+catalog.dropView(VIEW_IDENTIFIER);
+  }
+
+  @Test
+  public void testSuppressUnlockExceptions() {
+HiveViewOperations ops = (HiveViewOperations) ((BaseView) 
view).operations();
+ViewMetadata metadataV1 = ops.current();
+assertThat(ops.current().properties()).hasSize(0);
+
+view.updateProperties().set("k1", "v1").commit();
+ops.refresh();
+ViewMetadata metadataV2 = ops.current();
+assertThat(ops.current().properties()).hasSize(1);
+
+HiveViewOperations spyOps = spy(ops);
+
+AtomicReference lockRef = new AtomicReference<>();
+
+when(spyOps.lockObject())
+.thenAnswer(
+i -> {
+  HiveLock lock = (HiveLock) i.callRealMethod();
+  lockRef.set(lock);
+  return lock;
+});
+
+try {
+  spyOps.commit(metadataV2, me

Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-25 Thread via GitHub


pvary commented on code in PR #10200:
URL: https://github.com/apache/iceberg/pull/10200#discussion_r1579208675


##
core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java:
##
@@ -59,20 +60,58 @@ class SortedPosDeleteWriter implements 
FileWriter, DeleteWr
   OutputFileFactory fileFactory,
   FileFormat format,
   StructLike partition,
-  long recordsNumThreshold) {
+  long recordsNumThreshold,
+  DeleteGranularity deleteGranularity) {
 this.appenderFactory = appenderFactory;
 this.fileFactory = fileFactory;
 this.format = format;
 this.partition = partition;
 this.recordsNumThreshold = recordsNumThreshold;
+this.deleteGranularity = deleteGranularity;
+  }
+
+  SortedPosDeleteWriter(

Review Comment:
   These constructors are not utilised by our code (or only by tests), but kept 
them for compatibility reasons.
   Do we need them? Are they part of the API?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-25 Thread via GitHub


pvary commented on code in PR #10200:
URL: https://github.com/apache/iceberg/pull/10200#discussion_r1579204149


##
core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java:
##
@@ -174,20 +208,21 @@ private void flushDeletes() {
 } catch (IOException e) {
   setFailure(e);
   throw new UncheckedIOException(
-  "Failed to write the sorted path/pos pairs to pos-delete file: "
-  + outputFile.encryptingOutputFile().location(),

Review Comment:
   Adding the file location would require some confusing code, like:
   - Adding a local attribute to the class `lastOutputFile`
   - Updating this attribute in `createWriter`
   - We need to assume that if there is an error in 
`FileScopedPositionDeleteWriter` it is always with the last opened writer - 
which is true, but I am not sure we want to build on that 
   
   I think this is more confusing than what it worth to have the filename in 
the exception message, but I am open to discussion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-25 Thread via GitHub


nk1506 commented on code in PR #9852:
URL: https://github.com/apache/iceberg/pull/9852#discussion_r1579198316


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveViewOperations.java:
##
@@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import static java.util.Collections.emptySet;
+
+import java.util.Collections;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.TableType;
+import org.apache.hadoop.hive.metastore.api.InvalidObjectException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.iceberg.BaseMetastoreTableOperations;
+import org.apache.iceberg.CatalogUtil;
+import org.apache.iceberg.ClientPool;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.exceptions.CommitStateUnknownException;
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.ConfigProperties;
+import org.apache.iceberg.io.FileIO;
+import 
org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.util.PropertyUtil;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.ViewMetadata;
+import org.apache.iceberg.view.ViewRepresentation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Hive implementation of Iceberg {@link 
org.apache.iceberg.view.ViewOperations}. */
+final class HiveViewOperations extends BaseViewOperations implements 
HiveOperationsBase {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveViewOperations.class);
+
+  private final String fullName;
+  private final String database;
+  private final String viewName;
+  private final FileIO fileIO;
+  private final ClientPool metaClients;
+  private final long maxHiveTablePropertySize;
+  private final Configuration conf;
+  private final String catalogName;
+
+  HiveViewOperations(
+  Configuration conf,
+  ClientPool metaClients,
+  FileIO fileIO,
+  String catalogName,
+  TableIdentifier viewIdentifier) {
+String dbName = viewIdentifier.namespace().level(0);
+this.conf = conf;
+this.catalogName = catalogName;
+this.metaClients = metaClients;
+this.fileIO = fileIO;
+this.fullName = CatalogUtil.fullTableName(catalogName, viewIdentifier);
+this.database = dbName;
+this.viewName = viewIdentifier.name();
+this.maxHiveTablePropertySize =
+conf.getLong(HIVE_TABLE_PROPERTY_MAX_SIZE, 
HIVE_TABLE_PROPERTY_MAX_SIZE_DEFAULT);
+  }
+
+  @Override
+  public void doRefresh() {
+String metadataLocation = null;
+Table table;
+
+try {
+  table = metaClients.run(client -> client.getTable(database, viewName));
+  HiveOperationsBase.validateTableIsIcebergView(table, fullName);
+
+  metadataLocation =
+  
table.getParameters().get(BaseMetastoreTableOperations.METADATA_LOCATION_PROP);
+
+} catch (NoSuchObjectException e) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s.%s", database, 
viewName);
+  }
+} catch (TException e) {
+  String errMsg =
+  String.format("Failed to get view info from metastore %s.%s", 
database, viewName);
+  throw new RuntimeException(errMsg, e);
+} catch (InterruptedException e) {
+  Thread.currentThread().interrupt();
+  throw new Runt

[I] MinIO + Spark + hive metadata + iceberg format [iceberg]

2024-04-25 Thread via GitHub


rychu151 opened a new issue, #10222:
URL: https://github.com/apache/iceberg/issues/10222

   ### Query engine
   
   Spark
   
   ### Question
   
   Im trying to setup local develop env for my testing purposes using docker
   
   **Target is to save dataframe in a Iceberg format and Hive-metadata**
   
   Here is my current docker-compose:
   
   ```
   version: "3"
   
   services:
   
 #Jupyter Notebook with PySpark & iceberg Server
 spark-iceberg:
   image: tabulario/spark-iceberg
   container_name: spark-iceberg
   build: spark/
   networks:
 iceberg_net:
   depends_on:
 #- rest
 - minio
   volumes:
 - ./warehouse:/home/iceberg/warehouse
 - ./notebooks:/home/iceberg/notebooks/notebooks
 - 
./spark-iceberg/spark/jars/nessie-spark-extensions-3.5_2.12-0.80.0.jar:/opt/spark/jars/nessie-spark-extensions-3.5_2.12-0.80.0.jar
 - 
./spark-iceberg/spark/conf/spark-defaults.conf:/opt/spark/conf/spark-defaults.conf
   environment:
 - AWS_ACCESS_KEY_ID=admin
 - AWS_SECRET_ACCESS_KEY=password
 - AWS_REGION=us-east-1
 - USE_STREAM_CAPABLE_STATE_STORE=true
 - CATALOG_WAREHOUSE=s3://warehouse/
   ports:
 - ":"
 - "8080:8080"
 - "1:1"
 - "10001:10001"
 
 # Minio Storage Server
 minio:
   image: bitnami/minio:latest # not miniop/minio because of reported 
issues with the image
   container_name: minio
   environment:
 - MINIO_ROOT_USER=admin
 - MINIO_ROOT_PASSWORD=password
 - MINIO_REGION=us-east-1
 - MINIO_REGION_NAME=us-east-1
   networks:
 iceberg_net:
   aliases:
 - warehouse.minio
   ports:
 - "9001:9001"
 - "9000:9000"
 
 #hive metastore
 hive-metastore:
   image: apache/hive:4.0.0
   container_name: hive-metastore
   networks:
 iceberg_net:
   ports:
 - "9083:9083"
   environment:
   - SERVICE_NAME=metastore
   depends_on:
 - zookeeper
 - postgres
   volumes:
   - ./hive_metastore/conf/hive-site.xml:/opt/hive/conf/hive-site.xml
   ```
   
   
   spark-defaults.conf:
   
   ```
   spark.sql.extensions   
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   spark.sql.catalog.hive_prodorg.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.hive_prod.type   hive
   spark.sql.catalog.hive_prod.urithrift://hive-metastore:9083
   
   spark.sql.catalog.hive_prod.io-impl  
org.apache.iceberg.aws.s3.S3FileIO
   spark.sql.catalog.hive_prod.s3.endpoint  http://minio:9000
   spark.sql.catalog.hive_prod.warehouses3://warehouse/
   hive.metastore.uristhrift://hive-metastore:9083
   ```
   
   and hive-site.xml
   
   ```
   
   
   hive.server2.enable.doAs
   false
   
   
   hive.tez.exec.inplace.progress
   false
   
   
   hive.exec.scratchdir
   /opt/hive/scratch_dir
   
   
   hive.user.install.directory
   /opt/hive/install_dir
   
   
   tez.runtime.optimize.local.fetch
   true
   
   
   hive.exec.submit.local.task.via.child
   false
   
   
   mapreduce.framework.name
   local
   
   
   tez.local.mode
   true
   
   
   hive.execution.engine
   tez
   
   
   metastore.metastore.event.db.notification.api.auth
   false
   
   
   hive.metastore.warehouse.dir
   s3a://warehouse/
   
   
   fs.s3a.endpoint
   http://localhost:9000
   
   
   fs.s3a.access.key
   admin
   
   
   fs.s3a.secret.key
   password
   
   
   fs.s3a.path.style.access
   true
   
   
   fs.s3a.impl
   org.apache.hadoop.fs.s3a.S3AFileSystem
   
   
   fs.s3a.connection.ssl.enabled
   false
   
   
   hive.metastore.authorization.storage.checks
   false
   Disables storage-based authorization checks to allow 
Hive better compatibility with MinIO.
   
   
   
   
   ```
   
   using MinIO US i have created a bucket called `warehouse` and set it to 
public access
   
   
   **Target is to save dataframe in a Iceberg format and Hive-metadata** so i 
will be able to browse this data using Apache Druid
   
   
   in order to create a table i use PySpark:
   
   ```
   col_name = "col_name"
   label_name = "label"
   data_name = "upload_date"
   
   schema = StructType([
   StructField(data_name, LongType(), False),
   StructField(col_name, StringType(), False),
   StructField(label_name, StringType(), False)
   ])
   
   

Re: [PR] Core: Use 'delete' / 'append' if OverwriteFiles only deletes/appends data files [iceberg]

2024-04-25 Thread via GitHub


nastra merged PR #10150:
URL: https://github.com/apache/iceberg/pull/10150


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Sql catalog [iceberg-rust]

2024-04-25 Thread via GitHub


JanKaul commented on PR #229:
URL: https://github.com/apache/iceberg-rust/pull/229#issuecomment-2076689179

   Thank you all for your helpful comments. I think the PR is ready for review 
again.
   
   @liurenjie1024 @sdd @odysa @ZENOTME @martin-g 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] AWS: Fix TestGlueCatalogTable#testCreateTable [iceberg]

2024-04-25 Thread via GitHub


aajisaka commented on PR #10221:
URL: https://github.com/apache/iceberg/pull/10221#issuecomment-2076687909

   Hi @nastra would you review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-25 Thread via GitHub


pvary commented on code in PR #10200:
URL: https://github.com/apache/iceberg/pull/10200#discussion_r1579107619


##
data/src/test/java/org/apache/iceberg/io/TestTaskEqualityDeltaWriter.java:
##
@@ -409,6 +421,55 @@ public void testUpsertDataWithFullRowSchema() throws 
IOException {
 .isEqualTo(ImmutableList.of(posRecord.copy("file_path", 
dataFile.path(), "pos", 0L)));
   }
 
+  @TestTemplate
+  public void testDeleteFileGranularity() throws IOException {
+withGranularity(DeleteGranularity.FILE);
+  }
+
+  @TestTemplate
+  public void testDeletePartitionGranularity() throws IOException {
+withGranularity(DeleteGranularity.PARTITION);
+  }
+
+  private void withGranularity(DeleteGranularity granularity) throws 
IOException {
+List eqDeleteFieldIds = Lists.newArrayList(idFieldId, 
dataFieldId);
+Schema eqDeleteRowSchema = table.schema();
+
+GenericTaskDeltaWriter deltaWriter =
+createTaskWriter(eqDeleteFieldIds, eqDeleteRowSchema, granularity);
+
+Map expected = Maps.newHashMapWithExpectedSize(2000);
+// Create enough records, so we have multiple files
+for (int i = 0; i < 2000; ++i) {
+  Record record = createRecord(i, "aaa" + i);
+  deltaWriter.write(record);
+  if (i % 5 == 10) {
+deltaWriter.delete(record);
+  } else {
+expected.put(i, record);
+  }
+}
+
+// Add some deletes in the end
+for (int i = 0; i < 199; ++i) {
+  int id = i * 10 + 1;
+  Record record = createRecord(id, "aaa" + id);
+  deltaWriter.delete(record);
+  expected.remove(id);
+}
+
+WriteResult result = deltaWriter.complete();
+assertThat(result.dataFiles()).as("Should have 2 data files.").hasSize(2);

Review Comment:
   For the record, this is based on `BaseRollingWriter.ROWS_DIVISOR` which is 
not configurable.
   Added the comment describing this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] AWS: Fix TestGlueCatalogTable#testCreateTable [iceberg]

2024-04-25 Thread via GitHub


aajisaka opened a new pull request, #10221:
URL: https://github.com/apache/iceberg/pull/10221

   ## Problem
   
   TestGlueCatalogTable#testCreateTable failed by the following assertion error:
   
   ```
   org.opentest4j.AssertionFailedError: 
   expected: 
 ["s3://iceberg-integ-684001615763-ap-northeast-3/writeDataLoc",
 "s3://iceberg-integ-684001615763-ap-northeast-3/writeMetaDataLoc",
 "s3://iceberg-integ-684001615763-ap-northeast-3/writeFolderStorageLoc"]
but was: 
 ["s3://iceberg-integ-684001615763-ap-northeast-3/writeFolderStorageLoc",
 "s3://iceberg-integ-684001615763-ap-northeast-3/writeDataLoc",
 "s3://iceberg-integ-684001615763-ap-northeast-3/writeMetaDataLoc"]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
org.apache.iceberg.aws.glue.TestGlueCatalogTable.testCreateTable(TestGlueCatalogTable.java:97)
   ```
   
   It's caused by the JUnit 5 upgrade #10086. After this change, the elements 
are not sorted before comparing the list.
   
   ## Solution
   
   Use containsExactlyInAnyOrderElementsOf method: 
https://www.javadoc.io/doc/org.assertj/assertj-core/latest/org/assertj/core/api/AbstractIterableAssert.html#containsExactlyInAnyOrderElementsOf(java.lang.Iterable).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] MR: iceberg storage handler should set common projection pruning config [iceberg]

2024-04-25 Thread via GitHub


pvary commented on code in PR #10188:
URL: https://github.com/apache/iceberg/pull/10188#discussion_r1579101566


##
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -111,8 +111,15 @@ public void configureTableJobProperties(TableDesc 
tableDesc, Map
   // @Override
   public void configureInputJobCredentials(TableDesc tableDesc, Map secrets) {}
 
+  private void setCommonJobConf(JobConf jobConf) {

Review Comment:
   Thanks @ludlows!
   As I mentioned in another comment, could you please add some unit tests to 
ensure that this feature keeps working?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Is the "Emitting watermarks" new feature can't be used in flink sql? [iceberg]

2024-04-25 Thread via GitHub


pvary commented on issue #10219:
URL: https://github.com/apache/iceberg/issues/10219#issuecomment-2076664228

   @yeezychao: Do you happen to know, what is needed from the connector side to 
make this work?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support for REPLACE TABLE operation [iceberg-python]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #433:
URL: https://github.com/apache/iceberg-python/pull/433#discussion_r1579083787


##
pyiceberg/catalog/__init__.py:
##
@@ -710,6 +760,45 @@ def _get_updated_props_and_update_summary(
 
 return properties_update_summary, updated_properties
 
+def _replace_table(
+self,
+identifier: Union[str, Identifier],
+new_schema: Union[Schema, "pa.Schema"],
+new_partition_spec: PartitionSpec,
+new_sort_order: SortOrder,
+new_properties: Properties,
+new_location: Optional[str] = None,
+) -> Table:
+table = self.load_table(identifier)
+with table.transaction() as tx:
+base_schema = table.schema()
+new_schema = assign_fresh_schema_ids(schema_or_type=new_schema, 
base_schema=base_schema)

Review Comment:
   That's correct!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Refactor GlueCatalog's _commit_table [iceberg-python]

2024-04-25 Thread via GitHub


Fokko merged PR #653:
URL: https://github.com/apache/iceberg-python/pull/653


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Use 'delete' if OverwriteFiles only deletes data files [iceberg]

2024-04-25 Thread via GitHub


nastra commented on code in PR #10150:
URL: https://github.com/apache/iceberg/pull/10150#discussion_r1579059939


##
core/src/test/java/org/apache/iceberg/TestOverwrite.java:
##
@@ -295,5 +343,6 @@ public void testValidatedOverwriteWithAppendSuccess() {
 .hasMessageStartingWith("Cannot append file with rows that do not 
match filter");
 
 assertThat(latestSnapshot(base, branch).snapshotId()).isEqualTo(baseId);
+assertThat(latestSnapshot(table, 
branch).operation()).isEqualTo(DataOperations.APPEND);

Review Comment:
   you're right, it's a bit confusing and probably better to leave this 
assertion out



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Use 'delete' if OverwriteFiles only deletes data files [iceberg]

2024-04-25 Thread via GitHub


nastra commented on code in PR #10150:
URL: https://github.com/apache/iceberg/pull/10150#discussion_r1579060905


##
core/src/test/java/org/apache/iceberg/TestOverwrite.java:
##
@@ -135,6 +135,49 @@ public void createTestTable() throws IOException {
 commit(table, 
table.newAppend().appendFile(FILE_0_TO_4).appendFile(FILE_5_TO_9), branch);
   }
 
+  @TestTemplate
+  public void deleteDataFilesProducesDeleteOperation() {
+OverwriteFiles overwriteFiles = 
table.newOverwrite().deleteFile(FILE_A).deleteFile(FILE_B);
+
+commit(table, overwriteFiles, branch);
+Snapshot snap = latestSnapshot(table, branch);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Test: Add test to partition on field with a dot [iceberg-python]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #610:
URL: https://github.com/apache/iceberg-python/pull/610#discussion_r1579050799


##
tests/integration/test_reads.py:
##
@@ -473,6 +474,31 @@ def test_sanitize_character(catalog: Catalog) -> None:
 assert arrow_table.schema.names[0] == 
table_test_table_sanitized_character.schema().fields[0].name
 
 
+@pytest.mark.integration
+@pytest.mark.parametrize('catalog', 
[pytest.lazy_fixture('session_catalog_hive'), 
pytest.lazy_fixture('session_catalog')])
+def test_sanitize_character_partitioned(catalog: Catalog) -> None:
+table_name = "default.test_table_partitioned_sanitized_character"
+try:
+catalog.drop_table(table_name)
+except NoSuchTableError:
+pass
+
+tbl = catalog.create_table(
+identifier=table_name, schema=Schema(NestedField(field_id=1, 
name="some.id", type=IntegerType(), required=True))
+)

Review Comment:
   Sorry for the late reply here @HonahX. Great suggestion, I've moved the test 
👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] MR: iceberg storage handler should set common projection pruning config [iceberg]

2024-04-25 Thread via GitHub


ludlows commented on code in PR #10188:
URL: https://github.com/apache/iceberg/pull/10188#discussion_r1579048510


##
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -111,8 +111,15 @@ public void configureTableJobProperties(TableDesc 
tableDesc, Map
   // @Override
   public void configureInputJobCredentials(TableDesc tableDesc, Map secrets) {}
 
+  private void setCommonJobConf(JobConf jobConf) {

Review Comment:
   hi @pvary , thanks for your comments.
I have adapted the code to make sure that the setting includes both the old 
values and the `hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids`.  
   do you have any further sugguestions? 
   thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] AWS: Updating Glue catalog table removes column descriptions [iceberg]

2024-04-25 Thread via GitHub


aajisaka opened a new issue, #10220:
URL: https://github.com/apache/iceberg/issues/10220

   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   In AWS Glue Catalog, user can set arbitrary descriptions to the table and 
its columns via Web UI or even from Amazon Athena. However, the descriptions 
are deleted after upgrading the table.
   
   I submitted a patch to retain the table description in 
https://github.com/apache/iceberg/pull/10199, but I would like to retain column 
descriptions as well in a separate PR later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Introduce hierarchical namespaces into SqlCatalog [iceberg-python]

2024-04-25 Thread via GitHub


Fokko commented on PR #591:
URL: https://github.com/apache/iceberg-python/pull/591#issuecomment-2076573394

   > Should we allow table identifiers to have no namespace at all, like we do 
in Java? (pending @Fokko )
   
   No, I don't think we should allow this. We also disallow this for the REST 
catalog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] AWS: Retain Glue Catalog table description after updating Iceberg table [iceberg]

2024-04-25 Thread via GitHub


aajisaka commented on PR #10199:
URL: https://github.com/apache/iceberg/pull/10199#issuecomment-2076564061

   Also, I built the jars and manually tested on AWS Glue as a Spark runtime.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1579004695


##
crates/iceberg/src/io.rs:
##
@@ -368,6 +368,9 @@ impl Storage {
 new_props.insert("root".to_string(), DEFAULT_ROOT_PATH.to_string());
 
 match scheme {
+Scheme::Memory => Ok(Self::LocalFs {

Review Comment:
   I try to avoid adding code just for the sake of testing, inherently you're 
testing a different path than it would normally would.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1579004695


##
crates/iceberg/src/io.rs:
##
@@ -368,6 +368,9 @@ impl Storage {
 new_props.insert("root".to_string(), DEFAULT_ROOT_PATH.to_string());
 
 match scheme {
+Scheme::Memory => Ok(Self::LocalFs {

Review Comment:
   I try to avoid adding code just for the sake of testing :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on code in PR #349:
URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1578992843


##
crates/iceberg/src/transaction.rs:
##
@@ -121,6 +166,270 @@ impl<'a> Transaction<'a> {
 }
 }
 
+/// FastAppendAction is a transaction action for fast append data files to the 
table.
+pub struct FastAppendAction<'a> {
+tx: Transaction<'a>,
+
+parent_snapshot_id: Option,
+snapshot_id: i64,
+schema: Schema,
+schema_id: i32,
+format_version: FormatVersion,
+partition_spec: PartitionSpec,
+key_metadata: Vec,
+
+commit_uuid: String,
+manifest_id: i64,
+
+appended_data_files: Vec,
+}
+
+impl<'a> FastAppendAction<'a> {
+#[allow(clippy::too_many_arguments)]
+pub(crate) fn new(
+tx: Transaction<'a>,
+parent_snapshot_id: Option,
+snapshot_id: i64,
+schema: Schema,
+schema_id: i32,
+format_version: FormatVersion,
+partition_spec: PartitionSpec,
+key_metadata: Vec,
+commit_uuid: String,
+) -> Result {
+Ok(Self {
+tx,
+parent_snapshot_id,
+snapshot_id,
+schema,
+schema_id,
+format_version,
+partition_spec,
+key_metadata,
+commit_uuid,
+manifest_id: 0,
+appended_data_files: vec![],
+})
+}
+
+/// Add data files to the snapshot.
+pub fn add_data_files(
+&mut self,
+data_file: impl IntoIterator,
+) -> Result<&mut Self> {
+self.appended_data_files.extend(data_file);
+Ok(self)
+}
+
+fn generate_manifest_file_path(&mut self) -> String {
+let manifest_id = self.manifest_id;
+self.manifest_id += 1;
+format!(
+"{}/{}/{}-m{}.{}",
+self.tx.table.metadata().location(),
+META_ROOT_PATH,
+&self.commit_uuid,
+manifest_id,
+DataFileFormat::Avro
+)
+}
+
+async fn manifest_from_parent_snapshot(&self) -> Result> 
{
+if let Some(snapshot) = self.tx.table.metadata().current_snapshot() {
+let manifest_list = snapshot
+.load_manifest_list(self.tx.table.file_io(), 
&self.tx.table.metadata_ref())
+.await?;
+let mut manifest_files = 
Vec::with_capacity(manifest_list.entries().len());
+for entry in manifest_list.entries() {
+// From: 
https://github.com/apache/iceberg-python/blob/659a951d6397ab280cae80206fe6e8e4be2d3738/pyiceberg/table/__init__.py#L2921
+// Why we need this?
+if entry.added_snapshot_id == self.snapshot_id {
+continue;
+}
+let manifest = 
entry.load_manifest(self.tx.table.file_io()).await?;
+// Skip manifest with all delete entries.
+if manifest.entries().iter().all(|entry| !entry.is_alive()) {
+continue;
+}
+manifest_files.push(entry.clone());
+}
+Ok(manifest_files)
+} else {
+Ok(vec![])
+}
+}
+
+// Write manifest file for added data files and return the ManifestFile 
for ManifestList.
+async fn manifest_for_data_file(&mut self) -> Result {
+let appended_data_files = std::mem::take(&mut 
self.appended_data_files);
+let manifest_entries = appended_data_files
+.into_iter()
+.map(|data_file| {
+ManifestEntry::builder()
+.status(crate::spec::ManifestStatus::Added)
+.snapshot_id(self.snapshot_id)
+.data_file(data_file)
+.build()
+})
+.collect();
+let manifest_meta = ManifestMetadata::builder()
+.schema(self.schema.clone())
+.schema_id(self.schema_id)
+.format_version(self.format_version)
+.partition_spec(self.partition_spec.clone())
+.content(crate::spec::ManifestContentType::Data)
+.build();
+let manifest = Manifest::new(manifest_meta, manifest_entries);
+let writer = ManifestWriter::new(
+self.tx
+.table
+.file_io()
+.new_output(self.generate_manifest_file_path())?,
+self.snapshot_id,
+self.key_metadata.clone(),
+);
+writer.write(manifest).await
+}
+
+fn summary(&self) -> Summary {
+Summary {
+operation: crate::spec::Operation::Append,
+other: HashMap::new(),
+}
+}
+
+/// Finished building the action and apply it to the transaction.
+pub async fn apply(mut self) -> Result> {
+let summary = self.summary();
+let manifest = self.manifest_for_data_file().await?;
+let existing_manifest_files = 
self.manifest_from_par

Re: [I] Tracking issues of iceberg-rust v0.3.0 [iceberg-rust]

2024-04-25 Thread via GitHub


Fokko commented on issue #348:
URL: https://github.com/apache/iceberg-rust/issues/348#issuecomment-2076518611

   > Hi, @Fokko About the read projection part, currently we can convert 
parquet files into arrow streams, but there are some limitations: it only 
support primitive types, and schema evolution is not supported yet. Our 
discussion is in this issue: https://github.com/apache/iceberg-rust/issues/244 
And here is the first step of projection by @viirya : 
https://github.com/apache/iceberg-rust/pull/245
   
   Thanks for the context, I've just added this to the list.
   
   > About the glue, hive, rest catalogs, I think we already have integrations:
   
   Ah yes, I forgot to check those marks, thanks!
   
   > Also as we discussed in [this 
doc](https://docs.google.com/document/d/1YncDX-qQ1T9jBGQmJNtRcPU1trRi00cB8eykv5diKw4/edit?usp=sharing),
 do you mind to add datafusion integration, python binding, wasm binding into 
futures?
   
   Certainly! Great suggestions! I'm less familiar on some of these topics 
(like Datafusion), feel free to edit the post if you feel something is missing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] AWS: Retain Glue Catalog table description after updating Iceberg table [iceberg]

2024-04-25 Thread via GitHub


aajisaka commented on PR #10199:
URL: https://github.com/apache/iceberg/pull/10199#issuecomment-2076512208

   I ran the integration test using my AWS account and the `testUpdateTable()` 
was successful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] OSSFileIO not compatible with aliyun-sdk-oss higher than 3.11.3 [iceberg]

2024-04-25 Thread via GitHub


769484623 commented on issue #9934:
URL: https://github.com/apache/iceberg/issues/9934#issuecomment-2076510336

   > Currently the version being used in Iceberg is
   > 
   > 
https://github.com/apache/iceberg/blob/b714978583f5ba32ec34396cb82b0a8f100bf837/gradle/libs.versions.toml#L24
   > 
   > 
   > That means you'd have to make sure that your application uses the same 
version as Iceberg.
   > I'm not sure who actively maintains the `OSSFileIO` but feel free to open 
a PR and upgrade the underlying version being used.
   
   thanks nastra. 
   
   as using the same version can make it work, i'll close this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



<    1   2