Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10551:
URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913487801

   
   ## CI report:
   
   * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN
   * b5d8bc6da8e624931f73a12253997c1fb101e697 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22188)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7218] Integrate new HFile reader with file reader factory [hudi]

2024-01-27 Thread via GitHub


nsivabalan commented on code in PR #10330:
URL: https://github.com/apache/hudi/pull/10330#discussion_r1468774813


##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroHFileReader.java:
##
@@ -7,205 +7,136 @@
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
- *  http://www.apache.org/licenses/LICENSE-2.0
+ *   http://www.apache.org/licenses/LICENSE-2.0
  *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
  */
 
 package org.apache.hudi.io.storage;
 
-import org.apache.hudi.avro.HoodieAvroUtils;
 import org.apache.hudi.common.bloom.BloomFilter;
 import org.apache.hudi.common.bloom.BloomFilterFactory;
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.model.HoodieAvroIndexedRecord;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordLocation;
 import org.apache.hudi.common.util.Option;
-import org.apache.hudi.common.util.VisibleForTesting;
 import org.apache.hudi.common.util.collection.ClosableIterator;
 import org.apache.hudi.common.util.collection.CloseableMappingIterator;
 import org.apache.hudi.common.util.collection.Pair;
-import org.apache.hudi.common.util.io.ByteBufferBackedInputStream;
 import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.io.hfile.HFileReader;
+import org.apache.hudi.io.hfile.HFileReaderImpl;
+import org.apache.hudi.io.hfile.KeyValue;
+import org.apache.hudi.io.hfile.UTF8StringKey;
 import org.apache.hudi.util.Lazy;
 
 import org.apache.avro.Schema;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.generic.IndexedRecord;
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.fs.PositionedReadable;
-import org.apache.hadoop.fs.Seekable;
-import org.apache.hadoop.hbase.Cell;
-import org.apache.hadoop.hbase.KeyValue;
-import org.apache.hadoop.hbase.io.hfile.CacheConfig;
-import org.apache.hadoop.hbase.io.hfile.HFile;
-import org.apache.hadoop.hbase.io.hfile.HFileInfo;
-import org.apache.hadoop.hbase.io.hfile.HFileScanner;
-import org.apache.hadoop.hbase.nio.ByteBuff;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
-import java.util.Arrays;
+import java.nio.ByteBuffer;
 import java.util.Collections;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Objects;
 import java.util.Set;
-import java.util.SortedSet;
 import java.util.TreeSet;
 import java.util.stream.Collectors;
 
-import static org.apache.hudi.common.util.CollectionUtils.toStream;
-import static org.apache.hudi.common.util.StringUtils.getUTF8Bytes;
+import static org.apache.hudi.common.util.StringUtils.getStringFromUTF8Bytes;
 import static org.apache.hudi.common.util.TypeUtils.unsafeCast;
+import static org.apache.hudi.io.hfile.HFileUtils.isPrefixOfKey;
 
 /**
- * NOTE: PLEASE READ DOCS & COMMENTS CAREFULLY BEFORE MAKING CHANGES
- * 
- * {@link HoodieFileReader} implementation allowing to read from {@link HFile}.
+ * An implementation of {@link BaseHoodieAvroHFileReader} using built-in 
{@link HFileReader}.
  */
-public class HoodieAvroHFileReader extends HoodieAvroFileReaderBase implements 
HoodieSeekingFileReader {
-
-  // TODO HoodieHFileReader right now tightly coupled to MT, we should break 
that coupling
-  public static final String SCHEMA_KEY = "schema";
-  public static final String KEY_BLOOM_FILTER_META_BLOCK = "bloomFilter";
-  public static final String KEY_BLOOM_FILTER_TYPE_CODE = 
"bloomFilterTypeCode";
-
-  public static final String KEY_FIELD_NAME = "key";
-  public static final String KEY_MIN_RECORD = "minRecordKey";
-  public static final String KEY_MAX_RECORD = "maxRecordKey";
-
+public class HoodieAvroHFileReader extends BaseHoodieAvroHFileReader {
   private static final Logger LOG = 
LoggerFactory.getLogger(HoodieAvroHFileReader.class);
 
-  private final Path path;
-  private final FileSystem fs;
-  private final Configuration hadoopConf;
-  private final CacheConfig config;
-  private final Option content;
+  private final Configuration conf;
+  private final Option path;
+  private final Option bytesContent;
+  private Option shar

[jira] [Updated] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows

2024-01-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated HUDI-7354:

Attachment: cleanup.sql

> Flink Batch Read from Hudi table does not return any rows
> -
>
> Key: HUDI-7354
> URL: https://issues.apache.org/jira/browse/HUDI-7354
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink-sql
>Affects Versions: 0.14.1
>Reporter: Prabhu Joseph
>Priority: Major
> Attachments: cleanup.sql, flink-hudi.sql
>
>
> Flink Batch Read from Hudi table does not return any rows. The same flink sql 
> script returns 8 rows as expected on 0.14.0 Hudi version.
> *Repro Steps*
>  1. Flink 1.18.1 and Hudi 0.14.0
> 2. Open Flink YARN Session
> {code}
> flink-yarn-session -d -D execution.checkpointing.interval=10s -D 
> state.checkpoint-storage=filesystem  -D 
> state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
> {code}
> 3. Place CSV Input Data
> {code}
> cat > data < 1,Danny,23
> 2,Stephen,33
> 3,Julian,53
> 4,Fabian,31
> 5,Sophia,18
> 6,Emma,20
> 7,Bob,44
> 8,Han,56
> EOF
> hadoop fs -mkdir -p 
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> hadoop fs -put data 
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> {code}
> 4. Run attached Flink sql (flink-hudi.sql) script
> {code}
> /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql
> {code}
> The script makes a flink filesystem table with CSV data of 8 rows. Then, it 
> forms a Hudi table and puts in the data from the filesystem table. Finally, 
> it runs a select query from the Hudi table. The select query does not return 
> any data.
> 5. Cleanup the tables and databases using cleanup.sql
> *Analysis*
> The select query and insert query run together. The select query ends quickly 
> since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits 
> until the data loads and then retrieves it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows

2024-01-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated HUDI-7354:

Attachment: flink-hudi.sql

> Flink Batch Read from Hudi table does not return any rows
> -
>
> Key: HUDI-7354
> URL: https://issues.apache.org/jira/browse/HUDI-7354
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink-sql
>Affects Versions: 0.14.1
>Reporter: Prabhu Joseph
>Priority: Major
> Attachments: cleanup.sql, flink-hudi.sql
>
>
> Flink Batch Read from Hudi table does not return any rows. The same flink sql 
> script returns 8 rows as expected on 0.14.0 Hudi version.
> *Repro Steps*
>  1. Flink 1.18.1 and Hudi 0.14.0
> 2. Open Flink YARN Session
> {code}
> flink-yarn-session -d -D execution.checkpointing.interval=10s -D 
> state.checkpoint-storage=filesystem  -D 
> state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
> {code}
> 3. Place CSV Input Data
> {code}
> cat > data < 1,Danny,23
> 2,Stephen,33
> 3,Julian,53
> 4,Fabian,31
> 5,Sophia,18
> 6,Emma,20
> 7,Bob,44
> 8,Han,56
> EOF
> hadoop fs -mkdir -p 
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> hadoop fs -put data 
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> {code}
> 4. Run attached Flink sql (flink-hudi.sql) script
> {code}
> /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql
> {code}
> The script makes a flink filesystem table with CSV data of 8 rows. Then, it 
> forms a Hudi table and puts in the data from the filesystem table. Finally, 
> it runs a select query from the Hudi table. The select query does not return 
> any data.
> 5. Cleanup the tables and databases using cleanup.sql
> *Analysis*
> The select query and insert query run together. The select query ends quickly 
> since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits 
> until the data loads and then retrieves it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows

2024-01-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated HUDI-7354:

Description: 
Flink Batch Read from Hudi table does not return any rows. The same flink sql 
script returns 8 rows as expected on 0.14.0 Hudi version.


*Repro Steps*

 1. Flink 1.18.1 and Hudi 0.14.0

2. Open Flink YARN Session
{code}
flink-yarn-session -d -D execution.checkpointing.interval=10s -D 
state.checkpoint-storage=filesystem  -D 
state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
{code}

3. Place CSV Input Data
{code}
cat > data < data < Flink Batch Read from Hudi table does not return any rows
> -
>
> Key: HUDI-7354
> URL: https://issues.apache.org/jira/browse/HUDI-7354
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink-sql
>Affects Versions: 0.14.1
>Reporter: Prabhu Joseph
>Priority: Major
>
> Flink Batch Read from Hudi table does not return any rows. The same flink sql 
> script returns 8 rows as expected on 0.14.0 Hudi version.
> *Repro Steps*
>  1. Flink 1.18.1 and Hudi 0.14.0
> 2. Open Flink YARN Session
> {code}
> flink-yarn-session -d -D execution.checkpointing.interval=10s -D 
> state.checkpoint-storage=filesystem  -D 
> state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
> {code}
> 3. Place CSV Input Data
> {code}
> cat > data < 1,Danny,23
> 2,Stephen,33
> 3,Julian,53
> 4,Fabian,31
> 5,Sophia,18
> 6,Emma,20
> 7,Bob,44
> 8,Han,56
> EOF
> hadoop fs -mkdir -p 
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> hadoop fs -put data 
> s3://prabhuflinks3/test-output/flink/output/8d007d79-913d-4ed4-a6e4-9af591f24c36/csvinput/
> {code}
> 4. Run attached Flink sql (flink-hudi.sql) script
> {code}
> /usr/lib/flink/bin/sql-client.sh -f flink-hudi.sql
> {code}
> The script makes a flink filesystem table with CSV data of 8 rows. Then, it 
> forms a Hudi table and puts in the data from the filesystem table. Finally, 
> it runs a select query from the Hudi table. The select query does not return 
> any data.
> 5. Cleanup the tables and databases using cleanup.sql
> *Analysis*
> The select query and insert query run together. The select query ends quickly 
> since the Hudi table has no data yet. In Hudi 0.14.0, the select query waits 
> until the data loads and then retrieves it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7354) Flink Batch Read from Hudi table does not return any rows

2024-01-27 Thread Prabhu Joseph (Jira)
Prabhu Joseph created HUDI-7354:
---

 Summary: Flink Batch Read from Hudi table does not return any rows
 Key: HUDI-7354
 URL: https://issues.apache.org/jira/browse/HUDI-7354
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink-sql
Affects Versions: 0.14.1
Reporter: Prabhu Joseph


Flink Batch Read from Hudi table does not return any rows. The same flink sql 
script returns 8 rows as expected on 0.14.0 Hudi version.


*Repro Steps*

 1. Flink 1.18.1 and Hudi 0.14.0

2. Open Flink YARN Session
{code}
flink-yarn-session -d -D execution.checkpointing.interval=10s -D 
state.checkpoint-storage=filesystem  -D 
state.checkpoints.dir=s3://prabhuflinks3/test-output/flink/output/20eab3b1-d58a-491c-8819-15e451a549eb
{code}

3. Place CSV Input Data
{code}
cat > data <

Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10551:
URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913477595

   
   ## CI report:
   
   * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN
   * 977647a0ed72e2a2e8203924aa9a621463c9a193 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22187)
 
   * b5d8bc6da8e624931f73a12253997c1fb101e697 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22188)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10551:
URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913476402

   
   ## CI report:
   
   * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN
   * 977647a0ed72e2a2e8203924aa9a621463c9a193 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22187)
 
   * b5d8bc6da8e624931f73a12253997c1fb101e697 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7353) Fix TestOrcBootstrap and TestBootstrap

2024-01-27 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7353:
--
Description: Both tests failed when we used different FSV storage types.  
(was: The test would fail when we use different FSV storage type.)

> Fix TestOrcBootstrap and TestBootstrap
> --
>
> Key: HUDI-7353
> URL: https://issues.apache.org/jira/browse/HUDI-7353
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Assignee: Y Ethan Guo
>Priority: Major
>
> Both tests failed when we used different FSV storage types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7353) Fix TestOrcBootstrap and TestBootstrap

2024-01-27 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7353:
--
Summary: Fix TestOrcBootstrap and TestBootstrap  (was: Fix TestOrcBootstrap)

> Fix TestOrcBootstrap and TestBootstrap
> --
>
> Key: HUDI-7353
> URL: https://issues.apache.org/jira/browse/HUDI-7353
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Assignee: Y Ethan Guo
>Priority: Major
>
> The test would fail when we use different FSV storage type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7335] Create hudi-hadoop-common for hadoop-specific implementation [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10564:
URL: https://github.com/apache/hudi/pull/10564#issuecomment-1913368369

   
   ## CI report:
   
   * 7143fc0d8b881c9d28040d2b9c290c4d8b2e0f54 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22185)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7336][RFR|DNM] Introduce new HoodieStorage abstraction [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10567:
URL: https://github.com/apache/hudi/pull/10567#issuecomment-1913368388

   
   ## CI report:
   
   * ea050a0d021273313ef3d9e5e3c566186e0b28ee Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22186)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10551:
URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913342966

   
   ## CI report:
   
   * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN
   * 977647a0ed72e2a2e8203924aa9a621463c9a193 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22187)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10551:
URL: https://github.com/apache/hudi/pull/10551#issuecomment-191580

   
   ## CI report:
   
   * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN
   * 4612643dfe4861c49642612a3918b67ed0785eda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22164)
 
   * 09980192013540f9465d1549022bd06de0414238 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22184)
 
   * 977647a0ed72e2a2e8203924aa9a621463c9a193 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22187)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7336][RFR|DNM] Introduce new HoodieStorage abstraction [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10567:
URL: https://github.com/apache/hudi/pull/10567#issuecomment-1913331895

   
   ## CI report:
   
   * 8b765537863e155372fed079ea520f0a91a45b84 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22170)
 
   * ea050a0d021273313ef3d9e5e3c566186e0b28ee Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22186)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10551:
URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913331858

   
   ## CI report:
   
   * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN
   * 4612643dfe4861c49642612a3918b67ed0785eda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22164)
 
   * 09980192013540f9465d1549022bd06de0414238 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22184)
 
   * 977647a0ed72e2a2e8203924aa9a621463c9a193 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7335] Create hudi-hadoop-common for hadoop-specific implementation [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10564:
URL: https://github.com/apache/hudi/pull/10564#issuecomment-1913331873

   
   ## CI report:
   
   * 5b1fb24ea8a64c0373fa8e901802d6f3d0f5ff33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22163)
 
   * 7143fc0d8b881c9d28040d2b9c290c4d8b2e0f54 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22185)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7353) Fix TestOrcBootstrap

2024-01-27 Thread Lin Liu (Jira)
Lin Liu created HUDI-7353:
-

 Summary: Fix TestOrcBootstrap
 Key: HUDI-7353
 URL: https://issues.apache.org/jira/browse/HUDI-7353
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Lin Liu
Assignee: Y Ethan Guo


The test would fail when we use different FSV storage type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7336][RFR|DNM] Introduce new HoodieStorage abstraction [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10567:
URL: https://github.com/apache/hudi/pull/10567#issuecomment-1913330268

   
   ## CI report:
   
   * 8b765537863e155372fed079ea520f0a91a45b84 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22170)
 
   * ea050a0d021273313ef3d9e5e3c566186e0b28ee UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7335] Create hudi-hadoop-common for hadoop-specific implementation [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10564:
URL: https://github.com/apache/hudi/pull/10564#issuecomment-1913330252

   
   ## CI report:
   
   * 5b1fb24ea8a64c0373fa8e901802d6f3d0f5ff33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22163)
 
   * 7143fc0d8b881c9d28040d2b9c290c4d8b2e0f54 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7334] Remove EMBEDDED_KV_STORE based FSV usage in tests [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10551:
URL: https://github.com/apache/hudi/pull/10551#issuecomment-1913330242

   
   ## CI report:
   
   * 1a51332ae5c8ea94b303c1e0084cf4c33cba5113 UNKNOWN
   * 4612643dfe4861c49642612a3918b67ed0785eda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22164)
 
   * 09980192013540f9465d1549022bd06de0414238 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7351] Hive-sync partition pushdown does not work with glue [hudi]

2024-01-27 Thread via GitHub


parisni commented on code in PR #10572:
URL: https://github.com/apache/hudi/pull/10572#discussion_r1468632782


##
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##
@@ -578,6 +578,14 @@ public Map getMetastoreSchema(String 
tableName) {
 }
   }
 
+  @Override
+  public List getMetastoreFieldSchemas(String tableName) {
+Map schema = getMetastoreSchema(tableName);
+return schema.entrySet().stream()
+  .map(f -> new FieldSchema(f.getKey(), f.getValue()))
+  .collect(Collectors.toList());
+  }

Review Comment:
   it needs an integration tests with a glue endpoint. It might be possible 
with moto. However it's a large task. I can work on it on an other PR. That 
would make the aws module more stable i hope 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7348) Replace Configuration with StorageConfiguration for storage configuration

2024-01-27 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7348:

Summary: Replace Configuration with StorageConfiguration for storage 
configuration  (was: Replace Configuration with TypedProperties for storage 
configuration)

> Replace Configuration with StorageConfiguration for storage configuration
> -
>
> Key: HUDI-7348
> URL: https://issues.apache.org/jira/browse/HUDI-7348
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] Hudi 6868 - Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10577:
URL: https://github.com/apache/hudi/pull/10577#issuecomment-1913306009

   
   ## CI report:
   
   * ff12f8a7d10731760db2cfab799618a406507979 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22183)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Hudi 6868 - Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10577:
URL: https://github.com/apache/hudi/pull/10577#issuecomment-1913274310

   
   ## CI report:
   
   * ff12f8a7d10731760db2cfab799618a406507979 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22183)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Hudi 6868 - Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-27 Thread via GitHub


hudi-bot commented on PR #10577:
URL: https://github.com/apache/hudi/pull/10577#issuecomment-1913272241

   
   ## CI report:
   
   * ff12f8a7d10731760db2cfab799618a406507979 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Hudi 6868 - Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-27 Thread via GitHub


ad1happy2go opened a new pull request, #10577:
URL: https://github.com/apache/hudi/pull/10577

   ### Change Logs
   
   Hive Sync was not able to extract the password from Hadoop credential store. 
Added logic to do the same if available
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [BUG] Failure Encountered When Reading Hudi with Flink in Batch Runtime Mode and FlinkOptions.READ_AS_STREAMING=false [hudi]

2024-01-27 Thread via GitHub


ailinzhou opened a new issue, #10576:
URL: https://github.com/apache/hudi/issues/10576

   I am currently experiencing an issue when attempting to read Hudi with 
Flink. The problem arises when I configure the Flink RuntimeMode as 'batch' and 
set the Hudi FlinkOptions.READ_AS_STREAMING to 'false'.
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   1. Set Flink RuntimeMode to 'batch'. 
   2. Set Hudi FlinkOptions.READ_AS_STREAMING to 'false'. 
   3. Attempt to read Hudi with Flink.
   
   **Expected behavior**
   
   I expected read Hudi table in batch successfully with Flink under these 
configurations.
   
   ** Actual behavior **
   
   A failure occurs when attempting to read Hudi with Flink under these 
configurations.
   
   
   **Environment Description**
   
   * Hudi version : From 1.10 ~ 1.14
   
   * Flink version: 1.13
   
   **Additional context**
   
   In the `HoodieTableSource` implementation for Flink's `DynamicTableSource`, 
a `ScanRuntimeProvider` is provided. This `ScanRuntimeProvider` implements the 
`produceDataStream` method, which generates a `DataStreamSource`. However, when 
in Bounded mode, it not explicitly specify the `Boundedness` parameter. By 
default, Flink uses `Boundedness.CONTINUOUS_UNBOUNDED` as the default 
parameter, which could potentially be the cause of this issue.
   
   [Code at Hudi HoodieTableSource.java
   
](https://github.com/apache/hudi/blob/4c7ac6112daab349ebcdd1fbb2216d9d1138ca14/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java#L227C1-L228C1)
   
   ``` java
   if (conf.getBoolean(FlinkOptions.READ_AS_STREAMING)) {
 ...
   } else {
 ...
 DataStreamSource source = execEnv.addSource(func, 
asSummaryString(), typeInfo);
 ...
   }
   
   ```
   Perhaps the code could be modified as follows:
   
   ``` java
   if (!isBounded()) {
 ...
   } else {
 ...
 DataStreamSource source = execEnv.addSource(func, 
asSummaryString(), typeInfo, Boundedness.BOUNDED);
 ...
   }
   
   ```
   
   **Stacktrace**
   
   ``` java
   
   Caused by: java.lang.IllegalStateException: Detected an UNBOUNDED source 
with the 'execution.runtime-mode' set to 'BATCH'. This combination is not 
allowed, please set the 'execution.runtime-mode' to STREAMING or AUTOMATIC
   
   org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error: Detected an UNBOUNDED source with the 'execution.runtime-mode' 
set to 'BATCH'. This combination is not allowed, please set the 
'execution.runtime-mode' to STREAMING or AUTOMATIC
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:381)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:223)
at 
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at 
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at 
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at 
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Issue with reading the debezium inputs [hudi]

2024-01-27 Thread via GitHub


ad1happy2go commented on issue #10561:
URL: https://github.com/apache/hudi/issues/10561#issuecomment-1913161095

   @zyperd Thanks a lot. As discussed can you create a PR for adding the catch 
block where we can't instantiate using Metrics provider and print error message.
   
   Raised jira on same. Let me know in case you need any help. thanks.
   https://issues.apache.org/jira/browse/HUDI-7352


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7352) Add WARN log in catch block when not able to instantiate with Metrics Provider

2024-01-27 Thread Aditya Goenka (Jira)
Aditya Goenka created HUDI-7352:
---

 Summary: Add WARN log in catch block when not able to instantiate 
with Metrics Provider
 Key: HUDI-7352
 URL: https://issues.apache.org/jira/browse/HUDI-7352
 Project: Apache Hudi
  Issue Type: Improvement
  Components: deltastreamer
Reporter: Aditya Goenka
 Fix For: 1.1.0


Github Issue - [https://github.com/apache/hudi/issues/10561]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)