Re: [DISCUSS][HDFS] Add rust binding for libhdfs

2023-07-17 Thread Xuanwo
> What is libdirent? How is it relevant in this context? 

Since version 3.3, libhdfs depends on the dirent.h API. However, MSVC does not 
provide this header which causes issues when building libhdfs on Windows 
platforms. To solve this problem, hdfs-sys uses libdirent - a MSVC port of the 
dirent.h API for Windows.

Fortunately, hdfs has already done similar work in 
[native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can migrate 
to use hdfs's own implementation instead.

> How tightly coupled is it to a specific Hadoop version?

Thanks to hdfs's stable API, there is no breakage between different hadoop 
version (only addition). So the version matrix will be like:

- libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
...
- libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
...
- libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3

> The concern I have as a release manager is that it makes my life harder to 
> ensure the quality of a language binding that I am not familiar with.

Most of the code in libhdfs-rust is generated by [rust-bindgen], a tool 
developed by the Rust Team to automatically generate Rust FFI bindings for C 
(and some C++) libraries. Other parts are related to building and linking, 
similar to Makefile, such as finding libjvm and libhdfs.

In general, the task that libhdfs-rust performs is simple: it provides an API 
to Rust and links it with libhdfs.so, which I believe is easy to test.

[libdirect]: https://github.com/tronkko/dirent
[native/libhdfspp/lib/x-platform]: 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
[rust-bindgen]: https://github.com/rust-lang/rust-bindgen


On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
> Inline
> 
> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena  wrote:
>> Forwarding from dev@hadoop to relevant ML
>> 
>> Original mail: 
>> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
>> 
>> -Ayush
>> 
>> On 2023/07/15 09:18:42 Xuanwo wrote:
>> > Hello, everyone.
>> >
>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for Rust. 
>> > I want to know is it a good idea of accepting hdfs-sys as a part of hadoop 
>> > project?
>> >
>> > Users of hdfs-sys for now:
>> >
>> > - [OpenDAL]: An Apache Incubator project that allows users to easily and 
>> > efficiently retrieve data from various storage services in a unified way.
>> > - [Databend]: A modern cloud data warehouse focusing on reducing cost and 
>> > complexity for your massive-scale analytics needs. (via OpenDAL)
>> > - [RisingWave]: The distributed streaming database: SQL stream processing 
>> > with Postgres-like experience. (via OpenDAL)
>> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse framework
>> >
>> > Licenses information of hdfs-sys:
>> >
>> > - hdfs-sys itself licensed under Apache-2.0
>> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1, 
>> > hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual 
>> > licensed under Apache-2.0 and MIT. 
>> >
>> > Works need to do if accept:
>> >
>> > - Replace libdirent with the same dirent API implemented in HDFS project.
>> > - Remove all bundled hdfs C code.
> What is libdirent? How is it relevant in this context? 
> 
> How tightly coupled is it to a specific Hadoop version? I am wondering if 
> it's possible to host it in a separate Hadoop repo, if it's accepted. The 
> concern I have as a release manager is that it makes my life harder to ensure 
> the quality of a language binding that I am not familiar with.
>> >
>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
>> > [OpenDAL]: https://github.com/apache/incubator-opendal
>> > [Databend]: https://github.com/datafuselabs/databend
>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
>> >
>> > Xuanwo
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org
>> > For additional commands, e-mail: dev-h...@hadoop.apache.org
>> >
>> >
>> 
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Xuanwo


Apache Hadoop qbt Report: trunk+JDK11 on Linux/x86_64

2023-07-17 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/521/

[Jul 15, 2023, 6:30:07 AM] (github) HDFS-17086. Fix the parameter settings in 
TestDiskspaceQuotaUpdate#updateCountForQuota (#5842). Contributed by Haiyang Hu.
[Jul 16, 2023, 4:20:46 AM] (github) HADOOP-18801. Delete path directly when it 
can not be parsed in trash. (#5744). Contributed by farmmamba.
[Jul 16, 2023, 5:57:31 AM] (github) HDFS-17075. Reconfig disk balancer 
parameters for datanode (#5823). Contributed by Haiyang Hu.




-1 overall


The following subsystems voted -1:
blanks hadolint mvnsite pathlen spotbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

spotbugs :

   module:hadoop-hdfs-project/hadoop-hdfs 
   Redundant nullcheck of oldLock, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory))
 Redundant null check at DataStorage.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory))
 Redundant null check at DataStorage.java:[line 695] 
   Redundant nullcheck of metaChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long,
 FileInputStream, FileChannel, String) Redundant null check at 
MappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long,
 FileInputStream, FileChannel, String) Redundant null check at 
MappableBlockLoader.java:[line 138] 
   Redundant nullcheck of blockChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at MemoryMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at MemoryMappableBlockLoader.java:[line 75] 
   Redundant nullcheck of blockChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at NativePmemMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at NativePmemMappableBlockLoader.java:[line 85] 
   Redundant nullcheck of metaChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,,
 long, FileInputStream, FileChannel, String) Redundant null check at 
NativePmemMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,,
 long, FileInputStream, FileChannel, String) Redundant null check at 
NativePmemMappableBlockLoader.java:[line 130] 
   
org.apache.hadoop.hdfs.server.namenode.top.window.RollingWindowManager$UserCounts
  doesn't override java.util.ArrayList.equals(Object) At 
RollingWindowManager.java:At RollingWindowManager.java:[line 1] 

spotbugs :

   module:hadoop-yarn-project/hadoop-yarn 
   Redundant nullcheck of it, which is known to be non-null in 

Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2023-07-17 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1091/

No changes


ERROR: File 'out/email-report.txt' does not exist

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2023-07-17 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/

[Jul 16, 2023, 4:20:46 AM] (github) HADOOP-18801. Delete path directly when it 
can not be parsed in trash. (#5744). Contributed by farmmamba.
[Jul 16, 2023, 5:57:31 AM] (github) HDFS-17075. Reconfig disk balancer 
parameters for datanode (#5823). Contributed by Haiyang Hu.




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

Failed junit tests :

   hadoop.mapreduce.v2.TestUberAM 
   hadoop.mapreduce.v2.TestMRJobsWithProfiler 
   hadoop.mapreduce.v2.TestMRJobs 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-compile-javac-root.txt
 [12K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/blanks-eol.txt
 [15M]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/blanks-tabs.txt
 [2.0M]

   checkstyle:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-checkstyle-root.txt
 [13M]

   hadolint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-hadolint.txt
 [20K]

   pathlen:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-pathlen.txt
 [16K]

   pylint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-pylint.txt
 [20K]

   shellcheck:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-shellcheck.txt
 [24K]

   xml:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/xml.txt
 [24K]

   javadoc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/results-javadoc-javadoc-root.txt
 [244K]

   unit:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1290/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
 [72K]

Powered by Apache Yetus 0.14.0-SNAPSHOT   https://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSS][HDFS] Add rust binding for libhdfs

2023-07-17 Thread Wei-Chiu Chuang
Inline

On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena  wrote:

> Forwarding from dev@hadoop to relevant ML
>
> Original mail:
> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
>
> -Ayush
>
> On 2023/07/15 09:18:42 Xuanwo wrote:
> > Hello, everyone.
> >
> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C API for
> Rust. I want to know is it a good idea of accepting hdfs-sys as a part of
> hadoop project?
> >
> > Users of hdfs-sys for now:
> >
> > - [OpenDAL]: An Apache Incubator project that allows users to easily and
> efficiently retrieve data from various storage services in a unified way.
> > - [Databend]: A modern cloud data warehouse focusing on reducing cost
> and complexity for your massive-scale analytics needs. (via OpenDAL)
> > - [RisingWave]: The distributed streaming database: SQL stream
> processing with Postgres-like experience. (via OpenDAL)
> > - [LakeSoul]: an end-to-end, realtime and cloud native Lakehouse
> framework
> >
> > Licenses information of hdfs-sys:
> >
> > - hdfs-sys itself licensed under Apache-2.0
> > - hdfs-sys only depends on the following libs: cc@1.0.73, glob@0.3.1,
> hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they are all dual
> licensed under Apache-2.0 and MIT.

>
> > Works need to do if accept:
> >
> > - Replace libdirent with the same dirent API implemented in HDFS project.
> > - Remove all bundled hdfs C code.
>
What is libdirent? How is it relevant in this context?

How tightly coupled is it to a specific Hadoop version? I am wondering if
it's possible to host it in a separate Hadoop repo, if it's accepted. The
concern I have as a release manager is that it makes my life harder to
ensure the quality of a language binding that I am not familiar with.

> >
> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
> > [OpenDAL]: https://github.com/apache/incubator-opendal
> > [Databend]: https://github.com/datafuselabs/databend
> > [RisingWave]: https://github.com/risingwavelabs/risingwave
> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
> >
> > Xuanwo
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: dev-h...@hadoop.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


[jira] [Created] (HDFS-17092) Datanode Full Block Report failed can lead to missing and under replicated blocks

2023-07-17 Thread microle.dong (Jira)
microle.dong created HDFS-17092:
---

 Summary: Datanode Full Block Report failed can lead to missing and 
under replicated blocks
 Key: HDFS-17092
 URL: https://issues.apache.org/jira/browse/HDFS-17092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: microle.dong


when restarting namenode, we found that some datanodes did not report enough 
blocks, which  can lead to missing and under replicated blocks. 
I found in the logs of the datanode with incomplete block reporting that the 
first FBR attempt failed, due to namenode error

 
{code:java}
2023-07-14 11:29:24,776 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x7b738b02996cd2,  containing 12 storage 
report(s), of which we sent 1. The reports had 633033 total blocks and used 1 
RPC(s). This took 169 msec to generate and 97730 msecs for RPC and NN 
processing. Got back no commands.
2023-07-14 11:29:24,776 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in offerService
java.net.SocketTimeoutException: Call From x.x.x.x/x.x.x.x to x.x.x.x:9002 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/x.x.x.x:13868 
remote=x.x.x.x/x.x.x.x:9002]; For more details see:  
http://wiki.apache.org/hadoop/SocketTimeout 
t sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:863)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:822)
        at org.apache.hadoop.ipc.Client.call(Client.java:1480)
        at org.apache.hadoop.ipc.Client.call(Client.java:1413)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy14.blockReport(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:205)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:333)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:572)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:706)
        at java.lang.Thread.run(Thread.java:745){code}
the Datanode second FBR will use same lease , which will make namenode  remove 
the datanode  lease  (just as HDFS-8930) , lead to FBR failed because no lease 
is left.

we should  rest a new lease and try again when datanode FBR failed .

 I am willing to submit a PR to fix this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org