Re: [DISCUSS][HDFS] Add rust binding for libhdfs
> What is libdirent? How is it relevant in this context? Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does not provide this header which causes issues when building libhdfs on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC port of the dirent.h API for Windows. Fortunately, hdfs has already done similar work in [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can migrate to use hdfs's own implementation instead. > How tightly coupled is it to a specific Hadoop version? Thanks to hdfs's stable API, there is no breakage between different hadoop version (only addition). So the version matrix will be like: - libhdfs-rust (feature flag: v2_2) can access hadoop v2.2 ~ v3.3 ... - libhdfs-rust (feature flag: v2_10) can access hadoop v2.10 ~ v3.3 ... - libhdfs-rust (feature flag: v3_3) can access hadoop v3.3 > The concern I have as a release manager is that it makes my life harder to > ensure the quality of a language binding that I am not familiar with. Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool developed by the Rust Team to automatically generate Rust FFI bindings for C (and some C++) libraries. Other parts are related to building and linking, similar to Makefile, such as finding libjvm and libhdfs. In general, the task that libhdfs-rust performs is simple: it provides an API to Rust and links it with libhdfs.so, which I believe is easy to test. [libdirect]: https://github.com/tronkko/dirent [native/libhdfspp/lib/x-platform]: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h [rust-bindgen]: https://github.com/rust-lang/rust-bindgen On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote: > Inline > > On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena wrote: >> Forwarding from dev@hadoop to relevant ML >> >> Original mail: >> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq >> >> -Ayush >> >> On 2023/07/15 09:18:42 Xuanwo wrote: >> > Hello, everyone. >> > >> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for Rust. >> > I want to know is it a good idea of accepting hdfs-sys as a part of hadoop >> > project? >> > >> > Users of hdfs-sys for now: >> > >> > - [OpenDAL]: An Apache Incubator project that allows users to easily and >> > efficiently retrieve data from various storage services in a unified way. >> > - [Databend]: A modern cloud data warehouse focusing on reducing cost and >> > complexity for your massive-scale analytics needs. (via OpenDAL) >> > - [RisingWave]: The distributed streaming database: SQL stream processing >> > with Postgres-like experience. (via OpenDAL) >> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse framework >> > >> > Licenses information of hdfs-sys: >> > >> > - hdfs-sys itself licensed under Apache-2.0 >> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, >> > hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual >> > licensed under Apache-2.0 and MIT. >> > >> > Works need to do if accept: >> > >> > - Replace libdirent with the same dirent API implemented in HDFS project. >> > - Remove all bundled hdfs C code. > What is libdirent? How is it relevant in this context? > > How tightly coupled is it to a specific Hadoop version? I am wondering if > it's possible to host it in a separate Hadoop repo, if it's accepted. The > concern I have as a release manager is that it makes my life harder to ensure > the quality of a language binding that I am not familiar with. >> > >> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys >> > [OpenDAL]: https://github.com/apache/incubator-opendal >> > [Databend]: https://github.com/datafuselabs/databend >> > [RisingWave]: https://github.com/risingwavelabs/risingwave >> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul >> > >> > Xuanwo >> > >> > - >> > To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org >> > For additional commands, e-mail: dev-h...@hadoop.apache.org >> > >> > >> >> - >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org Xuanwo
Apache Hadoop qbt Report: trunk+JDK11 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/521/ [Jul 15, 2023, 6:30:07 AM] (github) HDFS-17086. Fix the parameter settings in TestDiskspaceQuotaUpdate#updateCountForQuota (#5842). Contributed by Haiyang Hu. [Jul 16, 2023, 4:20:46 AM] (github) HADOOP-18801. Delete path directly when it can not be parsed in trash. (#5744). Contributed by farmmamba. [Jul 16, 2023, 5:57:31 AM] (github) HDFS-17075. Reconfig disk balancer parameters for datanode (#5823). Contributed by Haiyang Hu. -1 overall The following subsystems voted -1: blanks hadolint mvnsite pathlen spotbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml spotbugs : module:hadoop-hdfs-project/hadoop-hdfs Redundant nullcheck of oldLock, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory)) Redundant null check at DataStorage.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory)) Redundant null check at DataStorage.java:[line 695] Redundant nullcheck of metaChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long, FileInputStream, FileChannel, String) Redundant null check at MappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long, FileInputStream, FileChannel, String) Redundant null check at MappableBlockLoader.java:[line 138] Redundant nullcheck of blockChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at MemoryMappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at MemoryMappableBlockLoader.java:[line 75] Redundant nullcheck of blockChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at NativePmemMappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at NativePmemMappableBlockLoader.java:[line 85] Redundant nullcheck of metaChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,, long, FileInputStream, FileChannel, String) Redundant null check at NativePmemMappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,, long, FileInputStream, FileChannel, String) Redundant null check at NativePmemMappableBlockLoader.java:[line 130] org.apache.hadoop.hdfs.server.namenode.top.window.RollingWindowManager$UserCounts doesn't override java.util.ArrayList.equals(Object) At RollingWindowManager.java:At RollingWindowManager.java:[line 1] spotbugs : module:hadoop-yarn-project/hadoop-yarn Redundant nullcheck of it, which is known to be non-null in
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1091/ No changes ERROR: File 'out/email-report.txt' does not exist - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/ [Jul 16, 2023, 4:20:46 AM] (github) HADOOP-18801. Delete path directly when it can not be parsed in trash. (#5744). Contributed by farmmamba. [Jul 16, 2023, 5:57:31 AM] (github) HDFS-17075. Reconfig disk balancer parameters for datanode (#5823). Contributed by Haiyang Hu. -1 overall The following subsystems voted -1: blanks hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml Failed junit tests : hadoop.mapreduce.v2.TestUberAM hadoop.mapreduce.v2.TestMRJobsWithProfiler hadoop.mapreduce.v2.TestMRJobs cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-compile-cc-root.txt [96K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-compile-javac-root.txt [12K] blanks: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/blanks-eol.txt [15M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/blanks-tabs.txt [2.0M] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-checkstyle-root.txt [13M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-hadolint.txt [20K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-pathlen.txt [16K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-shellcheck.txt [24K] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-javadoc-javadoc-root.txt [244K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [72K] Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Re: [DISCUSS][HDFS] Add rust binding for libhdfs
Inline On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena wrote: > Forwarding from dev@hadoop to relevant ML > > Original mail: > https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq > > -Ayush > > On 2023/07/15 09:18:42 Xuanwo wrote: > > Hello, everyone. > > > > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for > Rust. I want to know is it a good idea of accepting hdfs-sys as a part of > hadoop project? > > > > Users of hdfs-sys for now: > > > > - [OpenDAL]: An Apache Incubator project that allows users to easily and > efficiently retrieve data from various storage services in a unified way. > > - [Databend]: A modern cloud data warehouse focusing on reducing cost > and complexity for your massive-scale analytics needs. (via OpenDAL) > > - [RisingWave]: The distributed streaming database: SQL stream > processing with Postgres-like experience. (via OpenDAL) > > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse > framework > > > > Licenses information of hdfs-sys: > > > > - hdfs-sys itself licensed under Apache-2.0 > > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, > hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual > licensed under Apache-2.0 and MIT. > > > Works need to do if accept: > > > > - Replace libdirent with the same dirent API implemented in HDFS project. > > - Remove all bundled hdfs C code. > What is libdirent? How is it relevant in this context? How tightly coupled is it to a specific Hadoop version? I am wondering if it's possible to host it in a separate Hadoop repo, if it's accepted. The concern I have as a release manager is that it makes my life harder to ensure the quality of a language binding that I am not familiar with. > > > > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys > > [OpenDAL]: https://github.com/apache/incubator-opendal > > [Databend]: https://github.com/datafuselabs/databend > > [RisingWave]: https://github.com/risingwavelabs/risingwave > > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul > > > > Xuanwo > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: dev-h...@hadoop.apache.org > > > > > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >
[jira] [Created] (HDFS-17092) Datanode Full Block Report failed can lead to missing and under replicated blocks
microle.dong created HDFS-17092: --- Summary: Datanode Full Block Report failed can lead to missing and under replicated blocks Key: HDFS-17092 URL: https://issues.apache.org/jira/browse/HDFS-17092 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: microle.dong when restarting namenode, we found that some datanodes did not report enough blocks, which can lead to missing and under replicated blocks. I found in the logs of the datanode with incomplete block reporting that the first FBR attempt failed, due to namenode error {code:java} 2023-07-14 11:29:24,776 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x7b738b02996cd2, containing 12 storage report(s), of which we sent 1. The reports had 633033 total blocks and used 1 RPC(s). This took 169 msec to generate and 97730 msecs for RPC and NN processing. Got back no commands. 2023-07-14 11:29:24,776 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From x.x.x.x/x.x.x.x to x.x.x.x:9002 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/x.x.x.x:13868 remote=x.x.x.x/x.x.x.x:9002]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout t sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:863) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:822) at org.apache.hadoop.ipc.Client.call(Client.java:1480) at org.apache.hadoop.ipc.Client.call(Client.java:1413) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy14.blockReport(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:205) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:333) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:572) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:706) at java.lang.Thread.run(Thread.java:745){code} the Datanode second FBR will use same lease , which will make namenode remove the datanode lease (just as HDFS-8930) , lead to FBR failed because no lease is left. we should rest a new lease and try again when datanode FBR failed . I am willing to submit a PR to fix this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org