from:"\-"

Issue with JNI call for jniEnv->FindClass(/) in our application jar having HDFS client v3.4.0 bundled inside it.

2024-07-22 Thread Sonal Sharma A

Dear Hadoop Community Support Team,

I am writing to report an issue we are experiencing with JNI call for 
jniEnv->FindClass(/) in our application jar 
having HDFS client v3.4.0 bundled inside it. This issue does not occur with 
application jar bundled with HDFS client version 3.3.6 .
In other words we are facing issue in JNI call after upgrade of HDFS client.
There is no application code change reagarding the way class path, JNI 
environment, JNI Options, JNI args are being initialized before calling this 
method.
Only change is to upgrade from hdfs-client v3.3.6 to v3.4.0

This JNI call is from C++ code, 
jniEnv->FindClass(/).
Here are the details of the issue:

hdfs client library v3.4.0
OS : SLES 15 SP4
JNI Call: jniEnv->FindClass(/)
relative_class_path has forward slash(/) in between directories and is not 
absolute path. The class_name is the application specific class inside the jar.

We are getting the following Error Message when jniEnv->FindClass() method is 
called:
_dl_catch_error () from ../libc.so.6
in _dl_catch_error () from ..//libc.so.6
   in _dlerror_run () from ..//libdl.so.2
   in dlsym () from ..//libdl.so.2
   in NativeLookup::lookup_style(methodHandle const&, char*, char const*, int, 
bool, bool&, Thread*) () from
   ../java/jdk/lib/server/libjvm.so
   in NativeLookup::lookup_base(methodHandle const&, bool&, Thread*) ()
   from /../java/jdk/lib/server/libjvm.so
   in NativeLookup::lookup(methodHandle const&, bool&, Thread*) ()
   from /../java/jdk/lib/server/libjvm.so
   in InterpreterRuntime::prepare_native_call(JavaThread*, Method*) ()
   from /../java/jdk/lib/server/libjvm.so

We would greatly appreciate any insights or solutions you can provide to help 
resolve this issue.
If you need any further information or details, please do not hesitate to ask.

Regards
Sonal Sharma

Unsubscribe

2024-07-12 Thread Phil Stavridis



-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: Hadoop compatibility with ubuntu jammy OS

2024-07-11 Thread Daniel Howard

We run Hadoop clusters on Ubuntu 22.04. We've been running on various
Hadoop and Ubuntu versions over the years and never had a problem. Mixing
Ubuntu LTS versions (22.04 on some nodes, 22.04 on others) has not been a
problem either. Beyond providing disks and setting a few standard Linux
system-level tunables, Hadoop is very self-contained. Java for the win.

On Wed, Jul 10, 2024 at 6:13 AM Vishal Rajan 
wrote:

> Hi Team,
> Please share compatibility charts for different versions of ubuntu
> with hdfs/hbase/yarn version in the mix. I was unable to find a
> compatibility chart in the apache hadoop documentation. Thanks in advance,
> looking forward to a response at the earliest from the Apache Hadoop team.
>
>
> --
> --Regards
> Vishal Rajan
>
>

-- 
http://dannyman.toldme.com

unsubscribe

2024-07-10 Thread Manjunath Ballur




-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

unsubscribe

2024-07-10 Thread Phil Stavridis



-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: Confused Deputy Custom Header Support in Hadoop Conf

2024-07-10 Thread Jones, Daniel Carl

Hey Saurav,

There’s this feature request open that I think matches what you’re looking for:

“[HADOOP-18562] New configuration for static headers to be added to all S3 
requests“
https://issues.apache.org/jira/browse/HADOOP-18562

The good news is that there’s been some activity there recently with a PR up!

Danny

From: saurav kumar 
Date: Wednesday 10 July 2024 at 15:38
To: "user@hadoop.apache.org" 
Subject: [EXTERNAL] Confused Deputy Custom Header Support in Hadoop Conf


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi All,

I am using Apache Hadoop with Apache Flink and have a requirement of passing 
confused deputy headers to all S3 requests being made from within the 
framework. I was not able to find any config that allows users to pass custom 
request headers that can be propagated as part of S3 api requests.

Is anyone aware of how to achieve this or is there any recommended way to 
achieve this? If it's not supported yet would be a good candidate for a new 
request ?

References:
Supported Configuration : 
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
Class which should add request headers as part of each request: 
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java


Thanks,
Saurav Kumar

Confused Deputy Custom Header Support in Hadoop Conf

2024-07-10 Thread saurav kumar

Hi All,

I am using Apache Hadoop with Apache Flink and have a requirement of
passing confused deputy headers to all S3 requests being made from within
the framework. I was not able to find any config that allows users to pass
custom request headers that can be propagated as part of S3 api requests.

Is anyone aware of how to achieve this or is there any recommended way to
achieve this? If it's not supported yet would be a good candidate for a new
request ?

References:
Supported Configuration :
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
Class which should add request headers as part of each request:
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java


Thanks,
Saurav Kumar

Re: Hadoop compatibility with ubuntu jammy OS

2024-07-10 Thread Jim Halfpenny

Hi Vishal,
Since most if not all of the components in Hadoop use Java the operating system 
used tends to make little difference. You are not likely to find a chart 
showing which version of Ubuntu is supported by Hadoop, but you should find 
details at to which Java versions are required. Off the top of my head I 
believe Java 11 is required.

Kind regards,
Jim

> On 10 Jul 2024, at 11:13, Vishal Rajan  wrote:
> 
> Hi Team, 
> Please share compatibility charts for different versions of ubuntu with 
> hdfs/hbase/yarn version in the mix. I was unable to find a compatibility 
> chart in the apache hadoop documentation. Thanks in advance, looking forward 
> to a response at the earliest from the Apache Hadoop team.
> 
> 
> --
> --Regards
> Vishal Rajan
>

Hadoop compatibility with ubuntu jammy OS

2024-07-10 Thread Vishal Rajan

Hi Team,
Please share compatibility charts for different versions of ubuntu with
hdfs/hbase/yarn version in the mix. I was unable to find a compatibility
chart in the apache hadoop documentation. Thanks in advance, looking
forward to a response at the earliest from the Apache Hadoop team.


-- 
--Regards
Vishal Rajan

RE: Queries wrt HDFS 3.4.0

2024-07-09 Thread Sonal Sharma A

Hello Team,
Requesting to please reply over the query, we are stuck in our development 
because of this.

Regards
Sonal Sharma

From: Sonal Sharma A
Sent: Monday, July 1, 2024 5:49 PM
To: core-u...@hadoop.apache.org
Subject: RE: Queries wrt HDFS 3.4.0

Hello Team,

Please find detailed query as below:

We are planning to upgrade to HDFS 3.4.0 (client side) which fixes majority of 
the CVEs listed by our scan reports. However we have three CVEs on transitive 
3PPs included in hadoop-common which are not fixed in HDFS v3.4.0.

Our query is that if we update the individual transitive 3PPs to the versions 
in which CVEs are fixed, then Is HDFS client 3.4.0 compatible with these 
versions? For example, Is HDFS client 3.4.0 compatible with 
commons-compress-1.26.0 and apache-avro-1.11.3?

CVE Id
Current Version - HDFS 3.3.6
Updated version - HDFS3.4.0
CVE Fixed in 3pp Version
Severity
CVE-2024-25710
commons-compress-1.21
commons-compress-1.24.0
commons-compress-1.26.0
High
CVE-2024-26308
commons-compress-1.21
commons-compress-1.24.0
commons-compress-1.26.0
High
CVE-2023-39410
avro:1.7.7
avro:1.9.2
apache-avro version 1.11.3
High

Regards
Sonal Sharma

From: Sonal Sharma A
Sent: Monday, July 1, 2024 5:48 PM
To: core-u...@hadoop.apache.org
Subject: Queries wrt HDFS 3.4.0

Hello Team,

We are using HDFS 3.3.6 and planning to upgrade to  HDFS 3.4.0.  We have 2 
queries wrt this, please help here:

  1.  The commons-compress version coming with HDFS 3.4.0 is 1.24.0, Will HDFS 
support if we upgrade commons-compress version to 1.26.0?
  2.  Likewise, apache-avro version coming with HDFS 3.4.0 is 1.9.2, Will HDFS 
support if we upgrade apache-avro version to 1.11.3?

Regards
Sonal Sharma

Re: Queries wrt HDFS 3.4.0

2024-07-09 Thread Ayush Saxena

Hi Sonal,
It is more like a question for common-compress, if that dependency is
backward compatible, things should work. We don't test such scenarios, In
an ideal scenario, we should be using the versions packaged with the hadoop
release.

Regarding Avro that could be problematic, if you see the past discussions:
HADOOP-13386 

btw. I couldn't decode what is 3pp

-Ayush

On Tue, 9 Jul 2024 at 14:06, Sonal Sharma A
 wrote:

> Hello Team,
>
>
>
> We are planning to upgrade to HDFS 3.4.0 (client side) which fixes
> majority of the CVEs listed by our scan reports. However we have three CVEs
> on transitive 3PPs included in hadoop-common which are not fixed in HDFS
> v3.4.0.
>
>
>
> Our query is that if we update the individual transitive 3PPs to the
> versions in which CVEs are fixed, then Is HDFS client 3.4.0 compatible with
> these versions? For example, Is HDFS client 3.4.0 compatible with
> commons-compress-1.26.0 and apache-avro-1.11.3?
>
>
>
> *CVE Id*
>
> *Current Version - HDFS 3.3.6*
>
> *Updated version - HDFS3.4.0*
>
> *CVE Fixed in 3pp Version*
>
> *Severity*
>
> CVE-2024-25710
>
> commons-compress-1.21
>
> commons-compress-1.24.0
>
> commons-compress-1.26.0
>
> High
>
> CVE-2024-26308
>
> commons-compress-1.21
>
> commons-compress-1.24.0
>
> commons-compress-1.26.0
>
> High
>
> CVE-2023-39410
>
> avro:1.7.7
>
> avro:1.9.2
>
> apache-avro version 1.11.3
>
> High
>
>
>
> Regards
>
> Sonal Sharma
>
>
>

Queries wrt HDFS 3.4.0

2024-07-09 Thread Sonal Sharma A

Hello Team,

We are planning to upgrade to HDFS 3.4.0 (client side) which fixes majority of 
the CVEs listed by our scan reports. However we have three CVEs on transitive 
3PPs included in hadoop-common which are not fixed in HDFS v3.4.0.

Our query is that if we update the individual transitive 3PPs to the versions 
in which CVEs are fixed, then Is HDFS client 3.4.0 compatible with these 
versions? For example, Is HDFS client 3.4.0 compatible with 
commons-compress-1.26.0 and apache-avro-1.11.3?

CVE Id
Current Version - HDFS 3.3.6
Updated version - HDFS3.4.0
CVE Fixed in 3pp Version
Severity
CVE-2024-25710
commons-compress-1.21
commons-compress-1.24.0
commons-compress-1.26.0
High
CVE-2024-26308
commons-compress-1.21
commons-compress-1.24.0
commons-compress-1.26.0
High
CVE-2023-39410
avro:1.7.7
avro:1.9.2
apache-avro version 1.11.3
High

Regards
Sonal Sharma

Inquiry About Upgrading Node.js Version for Hadoop

2024-06-18 Thread Dingli Zhang


Hi all,

I am writing to inquire about any plans to upgrade the Node.js version 
used in the Hadoop project. As you may be aware, Node.js 12 has reached 
its end-of-life and is no longer supported. This could potentially 
expose our systems to security vulnerabilities and compatibility issues 
with newer tools and libraries.


Are there any plans to upgrade Node.js to a higher LTS version?

Thank you for your attention to this matter. I look forward to your 
response.


Best regards,
Dingli


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: Update UGI with new tokens during the lifespan of a yarn application

2024-06-11 Thread Clay B.


Hi Ankur,

There was some work I did in HADOOP-16298; the final code I used for 
$dayjob works for HDFS and HBase (HBase is non-renewable) tokens but 
Kubernetes was doing the on-disk token updates for me[1]; I just had to 
refresh the state in UGI. I ended up making the code proactively refresh 
tokens rather than wait for a token error, as there was a race condition 
when HBase 2.x came out. I think my last copy as I was trying to port it 
upstream was at: 
https://github.com/cbaenziger/hadoop/tree/hadoop-16298-wip (however, I no 
longer recall what was left to do).


Unfortunately, I got moved off this for my day job so haven't had time to 
revisit completing the contribution. Perhaps some of my hacks can help 
you? Also the testing got a bit zaney[2] to try and tickle races.


-Clay

[1]: A shim to get tokens similarly can be seen at 
https://github.com/cbaenziger/hadoop-token-cli
[2]: The testing code I used is at 
https://github.com/cbaenziger/delegation_token_tests


On Tue, 11 Jun 2024, Wei-Chiu Chuang wrote:


That sounds like what Spark did.Take a look at this
doc 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/README.md
The Spark AM has a Kerberos keytab and it periodically acquires a new 
delegation token (the old one is ignored) to make sure it always has a valid DT.
Finally, distribute the DT to all executors.

On Tue, Jun 11, 2024 at 4:34 AM Ankur Khanna  
wrote:

  Hi experts,

   

  I have a use-case with an external session token that is short lived and 
does not renew(ie, unlike a hadoop delegation token, the expiry time
  is not updated for this token). For a long running application (longer 
than the lifespan of the external token), I want to update the
  UGI/Credential object of each and every worker container with a new token.

  If I understand correctly, all delegation tokens are shared at the launch 
of a container.

  Is there any way to update the credential object after the launch of the 
container and during the lifespan of the application?


  Best,

  Ankur Khanna

   

 




-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: Update UGI with new tokens during the lifespan of a yarn application

2024-06-11 Thread Wei-Chiu Chuang

That sounds like what Spark did.
Take a look at this doc
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/README.md
The Spark AM has a Kerberos keytab and it periodically acquires a new
delegation token (the old one is ignored) to make sure it always has a
valid DT. Finally, distribute the DT to all executors.

On Tue, Jun 11, 2024 at 4:34 AM Ankur Khanna
 wrote:

> Hi experts,
>
>
>
> I have a use-case with an external session token that is short lived and
> does not renew(ie, unlike a hadoop delegation token, the expiry time is not
> updated for this token). For a long running application (longer than the
> lifespan of the external token), I want to update the UGI/Credential object
> of each and every worker container with a new token.
>
> If I understand correctly, all delegation tokens are shared at the launch
> of a container.
>
> Is there any way to update the credential object after the launch of the
> container and during the lifespan of the application?
>
>
> Best,
>
> Ankur Khanna
>
>
>
>
>

Update UGI with new tokens during the lifespan of a yarn application

2024-06-11 Thread Ankur Khanna

Hi experts,

I have a use-case with an external session token that is short lived and does 
not renew(ie, unlike a hadoop delegation token, the expiry time is not updated 
for this token). For a long running application (longer than the lifespan of 
the external token), I want to update the UGI/Credential object of each and 
every worker container with a new token.

If I understand correctly, all delegation tokens are shared at the launch of a 
container.

Is there any way to update the credential object after the launch of the 
container and during the lifespan of the application?

Best,
Ankur Khanna

Re: bootstrap standby namenode failure

2024-05-28 Thread anup ahire

Thanks Ayush,

I am trying to understand the reason why Active NN does not have a record
of txn ids that are in shared edit space.

On Sat, May 25, 2024 at 7:54 AM Ayush Saxena  wrote:

> Hi Anup,
> Did you explore: -skipSharedEditsCheck, Check this ticket once [1], if
> your use case is similar, little bit description can be found here
> [2], search for skipSharedEditsCheck, the jira does mention another
> solution as well, in case you don't like this or if it doesn't work
>
> -Ayush
>
>
> [1] https://issues.apache.org/jira/browse/HDFS-4120
> [2]
> https://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#namenode
>
> On Sat, 25 May 2024 at 01:59, anup ahire  wrote:
> >
> > Hello Team,
> >
> > I am trying to recover the failed node which has namenode and journal
> node,  the cluster has one active NN and 2 journal nodes currently.
> > When I am trying to setup node being recovered as standby, I am getting
> this error.
> >
> > java.io.IOException: Gap in transactions. Expected to be able to read up
> until at least txid 22450 but unable to find any edit logs containing txid 1
> >
> > Any idea what might be happening? As one NN active and 2 journal nodes
> are running, I was hoping all edit logs would be in sync.
> >
> > Thanks.
>

Re: HBase lz4 UnsatisfiedLinkError

2024-05-27 Thread fetch

Hi Ayush,

Upgrading to 2.6.0-hadoop3 worked, thanks so much!

On 2024-05-25 20:15, Ayush Saxena wrote:

Multiple things, the output of checknative only contains these stuff
only, not everything. From the code [1], So looking at your command
output everything is sorted there barring OpenSSL & PMDK which you
explicitly didn't ask for in your maven command & I believe you don't
need them either, in case you need them the instructions are there in
[2]

Looking at the trace:

org.apache.hadoop.io.compress.Lz4Codec.getLibraryName(Lz4Codec.java:73)

 at

You mentioned building ver 3.3.6, Your exception trace is calling
getLibraryName, which isn't present in the Lz4Codec.java in ver-3.3.6
[3], this method got removed as part of HADOOP-17292 [4] that is post
hadoop ver 3.3.1+, So, If you read the release notes of this ticket
you can see for Hadoop-3.3.1+ the Lz4 thing works OOTB

So, mostly it isn't a Hadoop problem.

What could be possible is the HBase version that you are using is
pulling in an older Hadoop release which is messing things up, So, I
would say try using the hadoop-3 binary of the latest version 2.6.0
[5]  & see how things go, else download the source tar of their latest
release 2.6.0 and build with -Phadoop-3.0
-Dhadoop-three.version=3.3.6, Looking at their source code they still
use 3.3.5 by default.

-Ayush

[1]
https://github.com/apache/hadoop/blob/1baf0e889fec54b6560417b62cada75daf6fe312/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeLibraryChecker.java#L137-L144
[2] https://github.com/apache/hadoop/blob/branch-3.3.6/BUILDING.txt
[3]
https://github.com/apache/hadoop/blob/branch-3.3.6/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/Lz4Codec.java#L73
[4] https://issues.apache.org/jira/browse/HADOOP-17292
[5]
https://www.apache.org/dyn/closer.lua/hbase/2.6.0/hbase-2.6.0-hadoop3-bin.tar.gz

On Sat, 25 May 2024 at 22:41,  wrote:

Hey Ayush, thanks for the advice!

Building 3.3.6 from an EL9.4 machine resulted in the following:

[root@localhost bin]# JAVA_HOME=/etc/alternatives/java_sdk_openjdk/
./hadoop checknative -a
2024-05-25 19:05:56,068 INFO bzip2.Bzip2Factory: Successfully loaded &
initialized native-bzip2 library system-native
2024-05-25 19:05:56,071 INFO zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2024-05-25 19:05:56,097 INFO nativeio.NativeIO: The native code was
built without PMDK support.
Native library checking:
hadoop:  true
/root/build/hadoop-3.3.6-src/hadoop-dist/target/hadoop-3.3.6/lib/native/libhadoop.so.1.0.0
zlib:true /lib64/libz.so.1
zstd  :  true /lib64/libzstd.so.1
bzip2:   true /lib64/libbz2.so.1
openssl: false EVP_CIPHER_CTX_block_size
ISA-L:   true /lib64/libisal.so.2
PMDK:false The native code was built without PMDK support.

No mention of lz4, though lz4[-devel] packages were installed on the
compiling host as per the BUILDING instructions. Is there a build 
option

I'm missing? I'm using:

* mvn -X package -Pdist,native -DskipTests -Dtar 
-Dmaven.javadoc.skip=true

Unfortunately the hbase "org.apache.hadoop.util.NativeLibraryChecker",
using this freshly made hadoop native library, also failed to load lz4
in the same way as the initial message with no extra information from 
debug:

2024-05-25T19:03:42,320 WARN  [main] lz4.Lz4Compressor:
java.lang.UnsatisfiedLinkError: 'void
org.apache.hadoop.io.compress.lz4.Lz4Compressor.initIDs()'
Exception in thread "main" java.lang.UnsatisfiedLinkError:
'java.lang.String
org.apache.hadoop.io.compress.lz4.Lz4Compressor.getLibraryName()'
 at
org.apache.hadoop.io.compress.lz4.Lz4Compressor.getLibraryName(Native
Method)
 at
org.apache.hadoop.io.compress.Lz4Codec.getLibraryName(Lz4Codec.java:73)
 at
org.apache.hadoop.util.NativeLibraryChecker.main(NativeLibraryChecker.java:109)

Thanks for your help!

On 5/25/24 4:16 PM, Ayush Saxena wrote:
> above things don't work then enable debug logging &
> then run the checknative command and capture the log & exception as
> here [2] & they might give you an an

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: HBase lz4 UnsatisfiedLinkError

2024-05-25 Thread Ayush Saxena

Multiple things, the output of checknative only contains these stuff
only, not everything. From the code [1], So looking at your command
output everything is sorted there barring OpenSSL & PMDK which you
explicitly didn't ask for in your maven command & I believe you don't
need them either, in case you need them the instructions are there in
[2]

Looking at the trace:
> org.apache.hadoop.io.compress.Lz4Codec.getLibraryName(Lz4Codec.java:73)
 at

You mentioned building ver 3.3.6, Your exception trace is calling
getLibraryName, which isn't present in the Lz4Codec.java in ver-3.3.6
[3], this method got removed as part of HADOOP-17292 [4] that is post
hadoop ver 3.3.1+, So, If you read the release notes of this ticket
you can see for Hadoop-3.3.1+ the Lz4 thing works OOTB

So, mostly it isn't a Hadoop problem.

What could be possible is the HBase version that you are using is
pulling in an older Hadoop release which is messing things up, So, I
would say try using the hadoop-3 binary of the latest version 2.6.0
[5]  & see how things go, else download the source tar of their latest
release 2.6.0 and build with -Phadoop-3.0
-Dhadoop-three.version=3.3.6, Looking at their source code they still
use 3.3.5 by default.

-Ayush

[1] 
https://github.com/apache/hadoop/blob/1baf0e889fec54b6560417b62cada75daf6fe312/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeLibraryChecker.java#L137-L144
[2] https://github.com/apache/hadoop/blob/branch-3.3.6/BUILDING.txt
[3] 
https://github.com/apache/hadoop/blob/branch-3.3.6/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/Lz4Codec.java#L73
[4] https://issues.apache.org/jira/browse/HADOOP-17292
[5] 
https://www.apache.org/dyn/closer.lua/hbase/2.6.0/hbase-2.6.0-hadoop3-bin.tar.gz

On Sat, 25 May 2024 at 22:41,  wrote:
>
> Hey Ayush, thanks for the advice!
>
> Building 3.3.6 from an EL9.4 machine resulted in the following:
>
> [root@localhost bin]# JAVA_HOME=/etc/alternatives/java_sdk_openjdk/
> ./hadoop checknative -a
> 2024-05-25 19:05:56,068 INFO bzip2.Bzip2Factory: Successfully loaded &
> initialized native-bzip2 library system-native
> 2024-05-25 19:05:56,071 INFO zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 2024-05-25 19:05:56,097 INFO nativeio.NativeIO: The native code was
> built without PMDK support.
> Native library checking:
> hadoop:  true
> /root/build/hadoop-3.3.6-src/hadoop-dist/target/hadoop-3.3.6/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> zstd  :  true /lib64/libzstd.so.1
> bzip2:   true /lib64/libbz2.so.1
> openssl: false EVP_CIPHER_CTX_block_size
> ISA-L:   true /lib64/libisal.so.2
> PMDK:false The native code was built without PMDK support.
>
> No mention of lz4, though lz4[-devel] packages were installed on the
> compiling host as per the BUILDING instructions. Is there a build option
> I'm missing? I'm using:
>
> * mvn -X package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true
>
>
> Unfortunately the hbase "org.apache.hadoop.util.NativeLibraryChecker",
> using this freshly made hadoop native library, also failed to load lz4
> in the same way as the initial message with no extra information from debug:
>
> 2024-05-25T19:03:42,320 WARN  [main] lz4.Lz4Compressor:
> java.lang.UnsatisfiedLinkError: 'void
> org.apache.hadoop.io.compress.lz4.Lz4Compressor.initIDs()'
> Exception in thread "main" java.lang.UnsatisfiedLinkError:
> 'java.lang.String
> org.apache.hadoop.io.compress.lz4.Lz4Compressor.getLibraryName()'
>  at
> org.apache.hadoop.io.compress.lz4.Lz4Compressor.getLibraryName(Native
> Method)
>  at
> org.apache.hadoop.io.compress.Lz4Codec.getLibraryName(Lz4Codec.java:73)
>  at
> org.apache.hadoop.util.NativeLibraryChecker.main(NativeLibraryChecker.java:109)
>
> Thanks for your help!
>
>
> On 5/25/24 4:16 PM, Ayush Saxena wrote:
> > above things don't work then enable debug logging &
> > then run the checknative command and capture the log & exception as
> > here [2] & they might give you an an

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: HBase lz4 UnsatisfiedLinkError

2024-05-25 Thread fetch


Hey Ayush, thanks for the advice!

Building 3.3.6 from an EL9.4 machine resulted in the following:

[root@localhost bin]# JAVA_HOME=/etc/alternatives/java_sdk_openjdk/ 
./hadoop checknative -a
2024-05-25 19:05:56,068 INFO bzip2.Bzip2Factory: Successfully loaded & 
initialized native-bzip2 library system-native
2024-05-25 19:05:56,071 INFO zlib.ZlibFactory: Successfully loaded & 
initialized native-zlib library
2024-05-25 19:05:56,097 INFO nativeio.NativeIO: The native code was 
built without PMDK support.

Native library checking:
hadoop:  true 
/root/build/hadoop-3.3.6-src/hadoop-dist/target/hadoop-3.3.6/lib/native/libhadoop.so.1.0.0

zlib:    true /lib64/libz.so.1
zstd  :  true /lib64/libzstd.so.1
bzip2:   true /lib64/libbz2.so.1
openssl: false EVP_CIPHER_CTX_block_size
ISA-L:   true /lib64/libisal.so.2
PMDK:    false The native code was built without PMDK support.

No mention of lz4, though lz4[-devel] packages were installed on the 
compiling host as per the BUILDING instructions. Is there a build option 
I'm missing? I'm using:


* mvn -X package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true


Unfortunately the hbase "org.apache.hadoop.util.NativeLibraryChecker", 
using this freshly made hadoop native library, also failed to load lz4 
in the same way as the initial message with no extra information from debug:


2024-05-25T19:03:42,320 WARN  [main] lz4.Lz4Compressor: 
java.lang.UnsatisfiedLinkError: 'void 
org.apache.hadoop.io.compress.lz4.Lz4Compressor.initIDs()'
Exception in thread "main" java.lang.UnsatisfiedLinkError: 
'java.lang.String 
org.apache.hadoop.io.compress.lz4.Lz4Compressor.getLibraryName()'
    at 
org.apache.hadoop.io.compress.lz4.Lz4Compressor.getLibraryName(Native 
Method)
    at 
org.apache.hadoop.io.compress.Lz4Codec.getLibraryName(Lz4Codec.java:73)
    at 
org.apache.hadoop.util.NativeLibraryChecker.main(NativeLibraryChecker.java:109)


Thanks for your help!


On 5/25/24 4:16 PM, Ayush Saxena wrote:

above things don't work then enable debug logging &
then run the checknative command and capture the log & exception as
here [2] & they might give you an an


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: bootstrap standby namenode failure

2024-05-25 Thread Ayush Saxena

Hi Anup,
Did you explore: -skipSharedEditsCheck, Check this ticket once [1], if
your use case is similar, little bit description can be found here
[2], search for skipSharedEditsCheck, the jira does mention another
solution as well, in case you don't like this or if it doesn't work

-Ayush


[1] https://issues.apache.org/jira/browse/HDFS-4120
[2] 
https://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#namenode

On Sat, 25 May 2024 at 01:59, anup ahire  wrote:
>
> Hello Team,
>
> I am trying to recover the failed node which has namenode and journal node,  
> the cluster has one active NN and 2 journal nodes currently.
> When I am trying to setup node being recovered as standby, I am getting this 
> error.
>
> java.io.IOException: Gap in transactions. Expected to be able to read up 
> until at least txid 22450 but unable to find any edit logs containing txid 1
>
> Any idea what might be happening? As one NN active and 2 journal nodes are 
> running, I was hoping all edit logs would be in sync.
>
> Thanks.

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: HBase lz4 UnsatisfiedLinkError

2024-05-25 Thread Ayush Saxena

Hi,

We can't help with the HBase thing, for that you need to chase the
HBase user ML.

For the `hadoop checknative -a` showing false, maybe the native
libraries that are pre-built & published aren't compatible with the OS
you are using, In that case you need to build them on the "same" OS,
the instructions are here: [1] & replace those generated native files
with the existing ones.

Second, Run the command `hadoop jnipath` and see the output path &
check you have the native libs in that directory.

If both of the above things don't work then enable debug logging &
then run the checknative command and capture the log & exception as
here [2] & they might give you an answer why the native libraries
aren't getting loaded.

Most probably solving the Hadoop stuff should dispel the HBase or any
downstream problem tethered to native libs.

-Ayush

[1] 
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html#Build
[2] 
https://github.com/apache/hadoop/blob/1baf0e889fec54b6560417b62cada75daf6fe312/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCodeLoader.java#L52-L55

On Sat, 25 May 2024 at 18:58,  wrote:
>
> Hi all,
>
> Using hadoop 3.3.6, hbase 2.5.6, and jdk 11  on EL9 we're seeing an
> UnsatisfiedLinkError when running the NativeLibraryChecker. It's
> identical to this question on StackOverflow:
>
> *
> https://stackoverflow.com/questions/72517212/check-hbase-native-extension-got-warn-main-lz4-lz4compressor-java-lang-unsati
>
> I've noticed it was moved from the os packages to lz4-java,  and now
> exists in hbase/libs. Is this just a java library path issue?
>
> On the NativeLibrary docs page it says the native hadoop library
> includes various components, including lz4. When running 'hadoop
> checknative -a' as is done in the example down the page, our output is
> missing lz4.
>
> *
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html
>
> Thanks for your time!
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

HBase lz4 UnsatisfiedLinkError

2024-05-25 Thread fetch


Hi all,

Using hadoop 3.3.6, hbase 2.5.6, and jdk 11  on EL9 we're seeing an 
UnsatisfiedLinkError when running the NativeLibraryChecker. It's 
identical to this question on StackOverflow:


* 
https://stackoverflow.com/questions/72517212/check-hbase-native-extension-got-warn-main-lz4-lz4compressor-java-lang-unsati


I've noticed it was moved from the os packages to lz4-java,  and now 
exists in hbase/libs. Is this just a java library path issue?


On the NativeLibrary docs page it says the native hadoop library 
includes various components, including lz4. When running 'hadoop 
checknative -a' as is done in the example down the page, our output is 
missing lz4.


* 
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html


Thanks for your time!

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

bootstrap standby namenode failure

2024-05-24 Thread anup ahire

Hello Team,

I am trying to recover the failed node which has namenode and journal
node,  the cluster has one active NN and 2 journal nodes currently.
When I am trying to setup node being recovered as standby, I am getting
this error.


*java.io.IOException: Gap in transactions. Expected to be able to read up
until at least txid 22450 but unable to find any edit logs containing txid
1*
Any idea what might be happening? As one NN active and 2 journal nodes are
running, I was hoping all edit logs would be in sync.

Thanks.

Memory Leak in Hadoop AWS integration

2024-05-17 Thread shashank admane

Hello Team,


I am executing DistCP commands in my SpringBoot application to copy files
to AWS S3 buckets.
JAR used for this integration -

   -  hadoop-aws-2.10.1.jar
   -  aws-java-sdk-bundle-1.11.837.jar

My application goes out of memory and I have to enforce GC to clear all
memory. My investigation says most of the hadoop classes are retained by
AWS SDK classes even till the last leg of distcp execution. Apparently, GC
can not garbage collect classes until they are released by AWS classes.
Even static variables are also referenced by AWS SDK classes. It is leading
to memory leak and as I run my application in thread I see many more
instances are getting retained causing memory leak further.

I am stuck trying to resolve these memory leaks. Any help on this would be
appreciated.

Thank You,
Shashank

Disk usage when using fast upload

2024-05-08 Thread Faiz Halde

Hi,

I have a quick question about hadoop s3 fast upload.

When using fast upload to upload a file of let's say 100G with a disk based
buffering set 128mb blocks ( active block = 1 for simplicity ), will my
disk usage be capped upto a limit or can it go upto a full 100G? i.e. will
hadoop s3 delete older chunks it buffered on the disk? We are trying to
understand how to size our volumes.

I know that without fast upload, a 100G disk usage is to be expected.

Thanks
Faiz

回复: How is HDFS Erasure Coding Phase II now？

2024-04-22 Thread zhangzhengli

Hi,
Thanks for the reply, I'm new to this, I understand from the attachment in
HDFS-8030[1], in the description of this document[2], the encoding is done
offline, the blocks initially still exist as copies and are converted to
Erasure coding form. I know erasure coding is mainly suitable for cold data.

[1]https://issues.apache.org/jira/browse/HDFS-8030
[2]https://issues.apache.org/jira/secure/attachment/12775826/HDFSErasureCodingPhaseII-20151204.pdf

从 Windows 版邮件发送

发件人: Ayush Saxena
发送时间: 2024年4月22日 19:06
收件人: 1278789...@qq.com
抄送: user@hadoop.apache.org
主题: Re: How is HDFS Erasure Coding Phase II now？

Hi,
> Or is it just not developed to this point?

It isn't developed & I don't think there is any effort going on in that
direction

> I learned that continuous layout can ensure the locality of file blocks

How? Erasure Coding will have BlockGroups not just one Block, whether you write
in a striped manner or in a Contiguous manner, it will spread over equal number
of Datanodes based on the BPP, I am not sure if anything changes with locality,
just by the way how EC Blocks are written.

> , I have large files and write them once and read them many times.

Erasure Coding in general was developed for storing Archival data, so you need
to figure out how "many" is ok.

-Ayush

On Mon, 22 Apr 2024 at 15:56, zhangzhengli <1278789...@qq.com.invalid> wrote:
Hi all, Since HDFS-8030, hdfs ec continuous layout has not developed much. Are
there any difficulties? Or is it just not developed to this point?
I learned that continuous layout can ensure the locality of file blocks, and I
want to use this feature in near-data scenarios. For example, I have large
files and write them once and read them many times.
Any suggestions are appreciated
从 Windows 版邮件发送

Re: How is HDFS Erasure Coding Phase II now？

2024-04-22 Thread Ayush Saxena

Hi,
>  Or is it just not developed to this point?

It isn't developed & I don't think there is any effort going on in that
direction

> I learned that continuous layout can ensure the locality of file blocks

How? Erasure Coding will have BlockGroups not just one Block, whether you
write in a striped manner or in a Contiguous manner, it will spread over
equal number of Datanodes based on the BPP, I am not sure if anything
changes with locality, just by the way how EC Blocks are written.

> , I have large files and write them once and read them many times.

Erasure Coding in general was developed for storing Archival data, so you
need to figure out how "many" is ok.

-Ayush

On Mon, 22 Apr 2024 at 15:56, zhangzhengli <1278789...@qq.com.invalid>
wrote:

> Hi all, Since HDFS-8030, hdfs ec continuous layout has not developed much.
> Are there any difficulties? Or is it just not developed to this point?
>
>
>
> I learned that continuous layout can ensure the locality of file blocks,
> and I want to use this feature in near-data scenarios. For example, I have
> large files and write them once and read them many times.
>
>
>
> Any suggestions are appreciated
>
>
>
> 从 Windows 版邮件 发送
>
>
>

How is HDFS Erasure Coding Phase II now？

2024-04-22 Thread zhangzhengli

Hi all, Since HDFS-8030, hdfs ec continuous layout has not developed much. Are 
there any difficulties? Or is it just not developed to this point?

I learned that continuous layout can ensure the locality of file blocks, and I 
want to use this feature in near-data scenarios. For example, I have large 
files and write them once and read them many times.

Any suggestions are appreciated

从 Windows 版邮件发送

Re: How to contribute code for the first time

2024-04-16 Thread Ayush Saxena

Hi Jim,
Directly create a PR against the trunk branch in Hadoop repo, if it is accepted 
then add the link to the PR and resubmit your request for Jira account, it will 
get approved 

-Ayush

> On 17 Apr 2024, at 10:02 AM, Jim Chen  wrote:
> 
> 
> Hi all, I want to optimize a script in dev-support in a hadoop project, how 
> do I submit a PR?
> 
> I tried to create apply jira account so that I can create an issue in jira 
> first, but the application was rejected. I was prompted to send a developer 
> email first.
> 
> Can anyone help me with this? Thanks!

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

How to contribute code for the first time

2024-04-16 Thread Jim Chen

Hi all, I want to optimize a script in dev-support in a hadoop project, how
do I submit a PR?

I tried to create apply jira account so that I can create an issue in jira
first, but the application was rejected. I was prompted to send a developer
email first.

Can anyone help me with this? Thanks!

第一次如何贡献代码

2024-04-16 Thread Jim Chen

大家好，我想优化一个hadoop项目中dev-support中的脚本，我该如何提交一个PR？

我尝试创建申请jira账号，好先在jira中创建一个issue，但是申请被拒绝了。提示我，先发送开发者邮件。

有人可以帮助我一下吗？谢谢！

Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-15 Thread Richard Zowalla

Hi Ayush,

thanks for your time investigating!

I followed your recommendation and it seems to work (also for some of
our consumer projects), so thanks a lot for your time!

Gruß
Richard


Am Samstag, dem 13.04.2024 um 03:35 +0530 schrieb Ayush Saxena:
> Hi Richard,
> Thanx for sharing the steps to reproduce the issue. I cloned the
> Apache Storm repo and was able to repro the issue. The build was
> indeed failing due to missing classes.
> 
> Spent some time to debug the issue, might not be very right (no
> experience with Storm), There are Two ways to get this going
> 
> First Approach: If we want to use the shaded classes
> 
> 1. I think the artifact to be used for minicluster should be `hadoop-
> client-minicluster`, even spark uses the same [1], the one which you
> are using is `hadoop-minicluster`, which in its own is empty
> ```
> ayushsaxena@ayushsaxena ~ %  jar tf
> /Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-
> minicluster/3.3.6/hadoop-minicluster-3.3.6.jar  | grep .class
> ayushsaxena@ayushsaxena ~ %
> ```
> 
> It just defines artifacts which are to be used by `hadoop-client-
> minicluster` and this jar has that shading and stuff, using `hadoop-
> minicluster` is like adding the hadoop dependencies into the pom
> transitively, without any shading or so, which tends to conflict with
> `hadoop-client-api` and `hadoop-client-runtime` jars, which uses the
> shaded classes.
> 
> 2. Once you change `hadoop-minicluster` to `hadoop-client-
> minicluster`, still the tests won't pass, the reason being the
> `storm-autocreds` dependency which pulls in the hadoop jars via
> `hbase-client` & `hive-exec`, So, we need to exclude them as well
> 
> 3. I reverted your classpath hack, changed the jar, & excluded the
> dependencies from storm-autocreds & ran the storm-hdfs tests & all
> the tests passed, which were failing initially without any code
> change
> ```
> [INFO] Results:
> [INFO]
> [INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0
> [INFO]
> [INFO] --
> --
> [INFO] BUILD SUCCESS
> [INFO] --
> --
> ```
> 
> 4. Putting the code diff here might make this mail unreadable, so I
> am sharing the link to the commit which fixed Storm for me here [2],
> let me know if it has any access issues, I will put the diff on the
> mail itself in text form.
> 
> Second Approach: If we don't want to use the shaded classes
> 
> 1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses
> shading which tends to conflict with your non shaded `hadoop-
> minicluster`, Rather than using these jars use the `hadoop-client`
> jar
> 
> 2. I removed your hack & changed those two jars with `hadoop-client`
> jar & the storm-hdfs tests passes
> 
> 3. I am sharing the link to the commit in my fork, it is here at [3],
> one advantage is, you don't have to change your existing jar nor you
> would need to add those exclusions in the `storm-cred` dependency.
> 
> ++ Adding common-dev, in case any fellow developers with more
> experience around using the hadoop-client jars can help, if things
> still don't work or Storm needs something more. The downstream
> projects which I have experience with don't use these jars (which
> they should ideally) :-) 
> 
> -Ayush
> 
> 
> [1] https://github.com/apache/spark/blob/master/pom.xml#L1382
> [2]
> https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6
> [3] 
> https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4b
> ab9b3f7ac8
> 
> 
> On Fri, 12 Apr 2024 at 10:41, Richard Zowalla 
> wrote:
> > Hi,
> > 
> > thanks for the fast reply. The PR is here [1].
> > 
> > It works, if I exclude the client-api and client-api-runtime from
> > being scanned in surefire, which is a hacky workaround for the
> > actual issue.
> > 
> > The hadoop-commons jar is a transient dependency of the
> > minicluster, which is used for testing.
> > 
> > Debugging the situation shows, that HttpServer2  is in the same
> > package in hadoop-commons as well as in the client-api but with
> > differences in methods / classes used, so depending on the
> > classpath order the wrong class is loaded.
> > 
> > Stacktraces are in the first GH Action run.here: [1]. 
> > 
> > A reproducer would be to check out Storm, go to storm-hdfs and
> > remove the exclusion in [2] and run the tests in that module, which
> > will fail due to a missing jetty server class (as the HTTPServer2
> > class is loaded from client-api instead of minicluster).
> > 
> > Gruß & Thx
> > Richard 
> > 
> > [1] https://github.com/apache/storm/pull/3637
> > [2]
> > https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120
> > 
> > On 2024/04/11 21:29:13 Ayush Saxena wrote:
> > > Hi Richard,
> > > I am not able to decode the issue properly here, It would have
> > > been
> > > better if you shared the PR or the failure

Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-12 Thread Ayush Saxena

Hi Richard,
Thanx for sharing the steps to reproduce the issue. I cloned the Apache
Storm repo and was able to repro the issue. The build was indeed failing
due to missing classes.

Spent some time to debug the issue, might not be very right (no
experience with Storm), There are Two ways to get this going

*First Approach: If we want to use the shaded classes*

1. I think the artifact to be used for minicluster should be
`hadoop-client-minicluster`, even spark uses the same [1], the one which
you are using is `hadoop-minicluster`, which in its own is empty
```
ayushsaxena@ayushsaxena ~ %  jar tf
/Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-minicluster/3.3.6/hadoop-minicluster-3.3.6.jar
 | grep .class
ayushsaxena@ayushsaxena ~ %
```

It just defines artifacts which are to be used by
`hadoop-client-minicluster` and this jar has that shading and stuff, using
`hadoop-minicluster` is like adding the hadoop dependencies into the pom
transitively, without any shading or so, which tends to conflict with
`hadoop-client-api` and `hadoop-client-runtime` jars, which uses the shaded
classes.

2. Once you change `hadoop-minicluster` to `hadoop-client-minicluster`,
still the tests won't pass, the reason being the `storm-autocreds`
dependency which pulls in the hadoop jars via `hbase-client` & `hive-exec`,
So, we need to exclude them as well

3. I reverted your classpath hack, changed the jar, & excluded the
dependencies from storm-autocreds & ran the storm-hdfs tests & all the
tests passed, which were failing initially without any code change
```
[INFO] Results:
[INFO]
[INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

```

4. Putting the code diff here might make this mail unreadable, so I am
sharing the link to the commit which fixed Storm for me here [2], let me
know if it has any access issues, I will put the diff on the mail itself in
text form.

*Second Approach: If we don't want to use the shaded classes*

1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses shading
which tends to conflict with your non shaded `hadoop-minicluster`, Rather
than using these jars use the `hadoop-client` jar

2. I removed your hack & changed those two jars with `hadoop-client` jar &
the storm-hdfs tests passes

3. I am sharing the link to the commit in my fork, it is here at [3], one
advantage is, you don't have to change your existing jar nor you would need
to add those exclusions in the `storm-cred` dependency.

++ Adding common-dev, in case any fellow developers with more
experience around using the hadoop-client jars can help, if things still
don't work or Storm needs something more. The downstream projects which I
have experience with don't use these jars (which they should ideally) :-)

-Ayush

[1] https://github.com/apache/spark/blob/master/pom.xml#L1382
[2]
https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6
[3]
https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4bab9b3f7ac8

On Fri, 12 Apr 2024 at 10:41, Richard Zowalla  wrote:

> Hi,
>
> thanks for the fast reply. The PR is here [1].
>
> It works, if I exclude the client-api and client-api-runtime from being
> scanned in surefire, which is a hacky workaround for the actual issue.
>
> The hadoop-commons jar is a transient dependency of the minicluster, which
> is used for testing.
>
> Debugging the situation shows, that HttpServer2  is in the same package in
> hadoop-commons as well as in the client-api but with differences in methods
> / classes used, so depending on the classpath order the wrong class is
> loaded.
>
> Stacktraces are in the first GH Action run.here: [1].
>
> A reproducer would be to check out Storm, go to storm-hdfs and remove the
> exclusion in [2] and run the tests in that module, which will fail due to a
> missing jetty server class (as the HTTPServer2 class is loaded from
> client-api instead of minicluster).
>
> Gruß & Thx
> Richard
>
> [1] https://github.com/apache/storm/pull/3637
> [2]
> https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120
>
> On 2024/04/11 21:29:13 Ayush Saxena wrote:
> > Hi Richard,
> > I am not able to decode the issue properly here, It would have been
> > better if you shared the PR or the failure trace as well.
> > QQ: Why are you having hadoop-common as an explicit dependency? Those
> > hadoop-common stuff should be there in hadoop-client-api
> > I quickly checked once on the 3.4.0 release and I think it does have
> them.
> >
> > ```
> > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
> > grep org/apache/hadoop/fs/FileSystem.class
> > org/apache/hadoop/fs/FileSystem.class
> > ``
> >
> > You didn't mention which shaded classes are being reported as
> > missing... I think spark uses

Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Richard Zowalla

Hi,

thanks for the fast reply. The PR is here [1].

It works, if I exclude the client-api and client-api-runtime from being scanned 
in surefire, which is a hacky workaround for the actual issue.

The hadoop-commons jar is a transient dependency of the minicluster, which is 
used for testing.

Debugging the situation shows, that HttpServer2  is in the same package in 
hadoop-commons as well as in the client-api but with differences in methods / 
classes used, so depending on the classpath order the wrong class is loaded.

Stacktraces are in the first GH Action run.here: [1]. 

A reproducer would be to check out Storm, go to storm-hdfs and remove the 
exclusion in [2] and run the tests in that module, which will fail due to a 
missing jetty server class (as the HTTPServer2 class is loaded from client-api 
instead of minicluster).

Gruß & Thx
Richard 

[1] https://github.com/apache/storm/pull/3637
[2] 
https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120

On 2024/04/11 21:29:13 Ayush Saxena wrote:
> Hi Richard,
> I am not able to decode the issue properly here, It would have been
> better if you shared the PR or the failure trace as well.
> QQ: Why are you having hadoop-common as an explicit dependency? Those
> hadoop-common stuff should be there in hadoop-client-api
> I quickly checked once on the 3.4.0 release and I think it does have them.
> 
> ```
> ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
> grep org/apache/hadoop/fs/FileSystem.class
> org/apache/hadoop/fs/FileSystem.class
> ``
> 
> You didn't mention which shaded classes are being reported as
> missing... I think spark uses these client jars, you can use that as
> an example, can grab pointers from here: [1] & [2]
> 
> -Ayush
> 
> [1] https://github.com/apache/spark/blob/master/pom.xml#L1361
> [2] https://issues.apache.org/jira/browse/SPARK-33212
> 
> On Thu, 11 Apr 2024 at 17:09, Richard Zowalla  wrote:
> >
> > Hi all,
> >
> > we are using "hadoop-minicluster" in Apache Storm to test our hdfs
> > integration.
> >
> > Recently, we were cleaning up our dependencies and I noticed, that if I
> > am adding
> >
> > 
> > org.apache.hadoop
> > hadoop-client-api
> > ${hadoop.version}
> > 
> > 
> > org.apache.hadoop
> > hadoop-client-runtime
> > ${hadoop.version}
> > 
> >
> > and have
> > 
> > org.apache.hadoop
> > hadoop-minicluster
> > ${hadoop.version}
> > test
> > 
> >
> > as a test dependency to setup a mini-cluster to test our storm-hdfs
> > integration.
> >
> > This fails weirdly because of missing (shaded) classes as well as a
> > class ambiquity with HttpServer2.
> >
> > It is present as a class inside of the "hadoop-client-api" and within
> > "hadoop-common".
> >
> > Is this setup wrong or should we try something different here?
> >
> > Gruß
> > Richard
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 
> 

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: [ANNOUNCE] Apache Hadoop 3.4.0 release

2024-04-11 Thread Sammi Chen

Xiaoqiao He and Shilun Fan

Awesome!  Thanks for leading the effort to release the Hadoop 3.4.0 !

Bests,
Sammi

On Tue, 19 Mar 2024 at 21:12, slfan1989  wrote:

> On behalf of the Apache Hadoop Project Management Committee, We are
> pleased to announce the release of Apache Hadoop 3.4.0.
>
> This is a release of Apache Hadoop 3.4 line.
>
> Key changes include
>
> * S3A: Upgrade AWS SDK to V2
> * HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
> * YARN Federation improvements
> * YARN Capacity Scheduler improvements
> * HDFS RBF: Code Enhancements, New Features, and Bug Fixes
> * HDFS EC: Code Enhancements and Bug Fixes
> * Transitive CVE fixes
>
> This is the first release of Apache Hadoop 3.4 line. It contains 2888 bug
> fixes, improvements and enhancements since 3.3.
>
> Users are encouraged to read the [overview of major changes][1].
> For details of please check [release notes][2] and [changelog][3].
>
> [1]: http://hadoop.apache.org/docs/r3.4.0/index.html
> [2]:
>
> http://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-common/release/3.4.0/RELEASENOTES.3.4.0.html
> [3]:
>
> http://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-common/release/3.4.0/CHANGELOG.3.4.0.html
>
> Many thanks to everyone who helped in this release by supplying patches,
> reviewing them, helping get this release building and testing and
> reviewing the final artifacts.
>
> Best Regards,
> Xiaoqiao He And Shilun Fan.
>

Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Ayush Saxena

Hi Richard,
I am not able to decode the issue properly here, It would have been
better if you shared the PR or the failure trace as well.
QQ: Why are you having hadoop-common as an explicit dependency? Those
hadoop-common stuff should be there in hadoop-client-api
I quickly checked once on the 3.4.0 release and I think it does have them.

```
ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
grep org/apache/hadoop/fs/FileSystem.class
org/apache/hadoop/fs/FileSystem.class
``

You didn't mention which shaded classes are being reported as
missing... I think spark uses these client jars, you can use that as
an example, can grab pointers from here: [1] & [2]

-Ayush

[1] https://github.com/apache/spark/blob/master/pom.xml#L1361
[2] https://issues.apache.org/jira/browse/SPARK-33212

On Thu, 11 Apr 2024 at 17:09, Richard Zowalla  wrote:
>
> Hi all,
>
> we are using "hadoop-minicluster" in Apache Storm to test our hdfs
> integration.
>
> Recently, we were cleaning up our dependencies and I noticed, that if I
> am adding
>
> 
> org.apache.hadoop
> hadoop-client-api
> ${hadoop.version}
> 
> 
> org.apache.hadoop
> hadoop-client-runtime
> ${hadoop.version}
> 
>
> and have
> 
> org.apache.hadoop
> hadoop-minicluster
> ${hadoop.version}
> test
> 
>
> as a test dependency to setup a mini-cluster to test our storm-hdfs
> integration.
>
> This fails weirdly because of missing (shaded) classes as well as a
> class ambiquity with HttpServer2.
>
> It is present as a class inside of the "hadoop-client-api" and within
> "hadoop-common".
>
> Is this setup wrong or should we try something different here?
>
> Gruß
> Richard

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Recommended way of using hadoop-minicluster für unit testing?

2024-04-11 Thread Richard Zowalla

Hi all,

we are using "hadoop-minicluster" in Apache Storm to test our hdfs
integration.

Recently, we were cleaning up our dependencies and I noticed, that if I
am adding


org.apache.hadoop
hadoop-client-api
${hadoop.version}


org.apache.hadoop
hadoop-client-runtime
${hadoop.version}


and have

org.apache.hadoop
hadoop-minicluster
${hadoop.version}
test


as a test dependency to setup a mini-cluster to test our storm-hdfs
integration.

This fails weirdly because of missing (shaded) classes as well as a
class ambiquity with HttpServer2. 

It is present as a class inside of the "hadoop-client-api" and within
"hadoop-common". 

Is this setup wrong or should we try something different here?

Gruß
Richard


signature.asc
Description: This is a digitally signed message part

Participate in the ASF 25th Anniversary Campaign

2024-04-03 Thread Brian Proffitt

Hi everyone,

As part of The ASF’s 25th anniversary campaign[1], we will be celebrating
projects and communities in multiple ways.

We invite all projects and contributors to participate in the following
ways:

* Individuals - submit your first contribution:
https://news.apache.org/foundation/entry/the-asf-launches-firstasfcontribution-campaign
* Projects - share your public good story:
https://docs.google.com/forms/d/1vuN-tUnBwpTgOE5xj3Z5AG1hsOoDNLBmGIqQHwQT6k8/viewform?edit_requested=true
* Projects - submit a project spotlight for the blog:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278466116
* Projects - contact the Voice of Apache podcast (formerly Feathercast) to
be featured: https://feathercast.apache.org/help/
*  Projects - use the 25th anniversary template and the #ASF25Years hashtag
on social media:
https://docs.google.com/presentation/d/1oDbMol3F_XQuCmttPYxBIOIjRuRBksUjDApjd8Ve3L8/edit#slide=id.g26b0919956e_0_13

If you have questions, email the Marketing & Publicity team at
mark...@apache.org.

Peace,
BKP

[1] https://apache.org/asf25years/

[NOTE: You are receiving this message because you are a contributor to an
Apache Software Foundation project. The ASF will very occasionally send out
messages relating to the Foundation to contributors and members, such as
this one.]

Brian Proffitt
VP, Marketing & Publicity
VP, Conferences

Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald

Hello to all users, contributors and Committers!

[ You are receiving this email as a subscriber to one or more ASF project
dev or user
  mailing lists and is not being sent to you directly. It is important that
we reach all of our
  users and contributors/committers so that they may get a chance to
benefit from this.
  We apologise in advance if this doesn't interest you but it is on topic
for the mailing
  lists of the Apache Software Foundation; and it is important please that
you do not
  mark this as spam in your email client. Thank You! ]

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code NA 2024 are now
open!

We will be supporting Community over Code NA, Denver Colorado in
October 7th to the 10th 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Monday 6th May, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Denver, Colorado , October 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

Re: ContainerId starts with 1 ?

2024-03-20 Thread 李响

Dear Hadoop/Yarn community,

I still beg your help for the question above.

Additionally, I might have other questions.
The target is to get the driver container id of a Spark app, from Yarn
Aggregation Log. I would like to call
LogAggregationIndexedFileController#readAggregatedLogsMeta()

then get the first ContainLogMeta from the list returned, then call
getContainerId() from it.
The questions are:

   1. Is the first ContainerLogMeta always the driver container?
   2. If the driver failed to get up for the first time somehow, but
   succeed in its second try. The container id will be added by 1 if I
   understand it correctly. Under this case, will the first ContainLogMeta
   returned by that function above be the first failed container, or the
   second successful container? Or the container id gets not changed after a
   failure?

Thanks!

On Fri, Feb 23, 2024 at 4:21 PM 李响  wrote:

> Dear Hadoop/Yarn community,
>
> In Yarn, a container is represented as
> container_e*epoch*_*clusterTimestamp*_*appId*_*attemptId*_*containerId*
>
> Regarding the last section, "containerId", as the sequential number of
> containers, I notice it does not start with 0, but 1.
>
> My question is:
> 1. Is that observation correct?
> 2. Sorry I do not find the code to support that. I read ContainerId.java
> and ContainerIdPBImpl.java but does not find the answer. Could you please
> show me the code path to support it staring with 1?
> 3. It seems counter-intuitive for me, as a programmer ^_^, who thinks the
> index should start with 0, rather than 1. If it is designed to start with
> 1, any background / thought / discussion to share?
>
> Thanks !!!
>
>
>
> --
>
>李响 Xiang Li
>
>
>

-- 

   李响 Xiang Li

手机 cellphone ：+86-136-8113-8972
邮件 e-mail  ：wate...@gmail.com

[ANNOUNCE] Apache Hadoop 3.4.0 release

2024-03-19 Thread slfan1989

On behalf of the Apache Hadoop Project Management Committee, We are
pleased to announce the release of Apache Hadoop 3.4.0.

This is a release of Apache Hadoop 3.4 line.

Key changes include

* S3A: Upgrade AWS SDK to V2
* HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
* YARN Federation improvements
* YARN Capacity Scheduler improvements
* HDFS RBF: Code Enhancements, New Features, and Bug Fixes
* HDFS EC: Code Enhancements and Bug Fixes
* Transitive CVE fixes

This is the first release of Apache Hadoop 3.4 line. It contains 2888 bug
fixes, improvements and enhancements since 3.3.

Users are encouraged to read the [overview of major changes][1].
For details of please check [release notes][2] and [changelog][3].

[1]: http://hadoop.apache.org/docs/r3.4.0/index.html
[2]:
http://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-common/release/3.4.0/RELEASENOTES.3.4.0.html
[3]:
http://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-common/release/3.4.0/CHANGELOG.3.4.0.html

Many thanks to everyone who helped in this release by supplying patches,
reviewing them, helping get this release building and testing and
reviewing the final artifacts.

Best Regards,
Xiaoqiao He And Shilun Fan.

Re: 为什么org.apache.hadoop.fs.FileSystem.Cache.Key的构造方法需要一个conf参数

2024-03-19 Thread Shuyan Zhang

hi 黄晟，
这是历史遗留代码造成的。过去获取ugi要使用conf，后来改变了ugi的获取方式，但漏删了参数。可参考
https://issues.apache.org/jira/browse/HADOOP-6299

黄晟  于2024年3月18日周一 19:24写道：

>
>
> 为什么org.apache.hadoop.fs.FileSystem.Cache.Key的构造方法需要一个conf参数
> ？但是传进来的这个conf却没有地方使用
>
>
>
> | |
> 黄晟
> |
> |
> huangshen...@163.com
> |
>
>

为什么org.apache.hadoop.fs.FileSystem.Cache.Key的构造方法需要一个conf参数

2024-03-18 Thread 黄晟



为什么org.apache.hadoop.fs.FileSystem.Cache.Key的构造方法需要一个conf参数 ？但是传进来的这个conf却没有地方使用



| |
黄晟
|
|
huangshen...@163.com
|

YARN SLS web issue

2024-03-11 Thread hehaore...@gmail.com

My Yarn version is 3.3.4, and I tested it on CentOS using Yarn SLS (Schedule Load Simulator). I found that no relevant exceptions were found during runtime, and real-time tracking. json, jobruntime. csv, tableMapping. csv, and metrics related data were generated. I access port 10001, which is the web of SLS, and it is good. But I want to see Simulation Charts and Tracked Jobs, there is no content. I followed the offline analysis method and copied the relevant files from Centos to a Windows computer for viewing, but the data is still empty. Can you help me see where the problem is? Thank you.从 Windows 版邮件发送 

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

[hdfs] [metrics] RpcAuthenticationSuccesses

2024-03-09 Thread Anatoly

Hi There is a question about two hdfs metrics that arose as a result of my attempts to calculate the load on the KDC for an industrial cluster There are two parameters in hdfs metricsRpcAuthenticationSuccesses - Total number of successful authentication attemptsRpcAuthenticationFailures - Total number of authentication failures I expect that any data request in the hadoop cluster will committhe request to KDC -> get ticket,the request to the NameNodeafter which the request counter should activate either +1 to the metric if successful, or +1 to the metric if unsuccessful However, in a test cluster where I have4 DataNodes and 2 NameNodes (HA), I see completely incomprehensible indicators for these metrics. By the way, at the same time, I noticed that the RpcAuthenticationSuccesses readings gradually increase by +1 every 30 seconds TEST 1I made sure that1. Only HDFS-{NN,DN,JN, ZKFC} and YARN-{RM,NM} services work2. All other components that were – hive, spark HistoryServer, are disabled3. There are no YARN jobs running and no user requests to hdfs At the time of testing, the value of RpcAuthenticationFailures indicators = 0RpcAuthenticationSuccesses = 208322 To check the download, I run the spark-submit test - spark-examples_2.12-3.5.0.jar with the number of performers = 1The request was completed in 1 minute and 20 secondsRpcAuthenticationSuccesses = 208338 In total, +16 was added to the original value during executionLet's say +2 can be attributed to the moment I wrote about above +1 every 30 seconds. But what does +14 authentications mean? TEST 2RpcAuthenticationFailures = 0RpcAuthenticationSuccesses = 208388 hdfs dfs -ls /RpcAuthenticationFailures = 0RpcAuthenticationSuccesses = 208389Added +1. Why?I started kinit long before the ls/request, i.e. the metric should not have changed, I think so, but maybe I'm wrong TEST 3 disabled- All DN are- Satndby NN- All YARN services (RM, NM) still runningThree JN, ZKFCOne NN is active The +1 counter continues to add +1 to the RpcAuthenticationSuccesses metric every 30 seconds Either I misunderstand the meaning of these indicators, or something is considered wrongCan you tell me how these indicators are calculated, I do not understand this or is it an error in the calculations and if I do not understand the work of these metrics, then how is it correct? Thank you very much --With best wishes,Anatoliy

Hello and my first request for help: courses.

2024-03-06 Thread SysAdm CID

Hi, I hope this question is appropriate for this forum. If not, please
advise. Total beginner here.

I just finished a free introductory Hadoop course at a very popular online
school, it was very didactic and was good to get me started. But it was
based on a very old version of Hadoop (2.6.0), and many of the examples
didn't work with my test machine. (Good thing they provided a free online
cluster) I had to fudge the command line syntax all the time when testing
locally, and when trying a Java MapReduce example the dependencies were all
wrong.

Can you recommend some good courses (both admin and devel) based on 3.3.x,
or at least a version close enough to it to be compatible? Preferably
covering Hive and Impala too. Online would be best, but if I have to fly,
so be it.

Thanks in Advance,
Juan Carlos Castro

unsubscribe

2024-03-05 Thread Prashanth Salian

Frequent data blocks

2024-02-26 Thread Mohammad Aghanabi

Hi.

Are there any stats that show the number of accesses of each block in HDFS?
I read WebHDFS documentation page but didn't find anything related to it.

Re: NM status during RM failover

2024-02-25 Thread Hariharan

> We observe a drop of NumActiveNodes metric when fails over on a new RM.
Is that normal?

Yes, this does not seem unusual - the NMs will try to connect to the old RM
for some time before they fail over to the new RM. If this time exceeds the
heartbeat interval, the NMs may show up as disconnected until they reach
out to the new RM.

~ Hariharan

On Sun, Feb 25, 2024 at 4:12 PM Dong Ye  wrote:

> Hi, All:
>
>   I have a question, in the high availability resource manager
> scenario, how does the states of NodeManagers change if a new leader RM is
> elected? We observe a drop of NumActiveNodes metric when fails over on a
> new RM. Is that normal? Any documentation explains how the NM states will
> change? RM version is 2.8.5.
>
> Thanks.
> Have a nice day!
>

unsubscribe me

2024-02-24 Thread Man-Young Goo

please, unsubscribe meManyoung Goo E-mail : my...@nate.comTel : +82-2-360-1590
-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: NM status during RM failover

2024-02-24 Thread Dong Ye

Hi, All:

How to reduce RM fail over because it introduces disturbances to current
workload. The failover is mainly because of JVM pause (around 6 seconds)and
high CPU usage.

Thanks.
Have a nice day!

On Sat, Feb 24, 2024 at 8:06 PM Dong Ye  wrote:

> Hi, All:
>
>   I have a question, in the high availability resource manager
> scenario, how does the states of NodeManagers change if a new leader RM is
> elected? We observe a drop of NumActiveNodes metric when fails over on a
> new RM. Is that normal? Any documentation explains how the NM states will
> change? RM version is 2.8.5.
>
> Thanks.
> Have a nice day!
>

NM status during RM failover

2024-02-24 Thread Dong Ye

Hi, All:

  I have a question, in the high availability resource manager
scenario, how does the states of NodeManagers change if a new leader RM is
elected? We observe a drop of NumActiveNodes metric when fails over on a
new RM. Is that normal? Any documentation explains how the NM states will
change? RM version is 2.8.5.

Thanks.
Have a nice day!

ContainerId starts with 1 ?

2024-02-23 Thread 李响

Dear Hadoop/Yarn community,

In Yarn, a container is represented as
container_e*epoch*_*clusterTimestamp*_*appId*_*attemptId*_*containerId*

Regarding the last section, "containerId", as the sequential number of
containers, I notice it does not start with 0, but 1.

My question is:
1. Is that observation correct?
2. Sorry I do not find the code to support that. I read ContainerId.java
and ContainerIdPBImpl.java but does not find the answer. Could you please
show me the code path to support it staring with 1?
3. It seems counter-intuitive for me, as a programmer ^_^, who thinks the
index should start with 0, rather than 1. If it is designed to start with
1, any background / thought / discussion to share?

Thanks !!!



-- 

   李响 Xiang Li

MagicS3GuardCommitter's compatibility with Tez

2024-02-20 Thread Venkatasubrahmanian Narayanan

Hello
I'm working on a mapred wrapper for the MagicS3GuardCommiter (necessary
for Hive since it cannot migrate to MRv2, and anything that interacts with
Hive tables like Pig). The wrapper just forwards calls to the mapreduce
version similar to how other mapred classes are implemented, and it works
correctly with the MR execution engine.

When using the committer for a simple Pig job which writes to a Hive table,
the job completes but none of the data is committed when using Tez as the
execution engine. Based on my investigations, it looks like the root cause
is that the MagicS3GuardCommitter assumes that the jobId of both the task
and the job are the same when determining where to write the PendingSet.
This assumption holds for MR, but not Tez. Reproducing the behavior
requires applying a patch to add the wrapper. Is this something with a
known workaround? Should I go ahead and open a JIRA for this?

If I should be e-mailing a different mailing list about this, let me know,
I wasn't sure which one I should go to for hadoop-aws issues.

Thanks
Venkatasubrahmanian Narayanan

Re: subscribe

2024-02-20 Thread Battula, Brahma Reddy

Please drop mail to 
"user-unsubscr...@hadoop.apache.org" 
as mentioned in the footer mail.

From: Shuyan Zhang 
Date: Thursday, February 1, 2024 at 09:00
To: user@hadoop.apache.org 
Subject: subscribe
subscribe

Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald

Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code Asia 2024 are now
open!

We will be supporting Community over Code Asia, Hangzhou, China
July 26th - 28th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this year's applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, May 10th, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you to
apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Hangzhou, China in July, 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

Query : Hadoop Cluster OS upgrade

2024-02-18 Thread Brahma Reddy Battula

Hi All,


Does anybody tried out/share learnings ,using maintenance state or  upgrade
domains for big data cluster OS upgrades?



Regards,
Brahma

Re: unsubscribe

2024-02-10 Thread Brahma Reddy Battula

Please drop mail to "user-unsubscr...@hadoop.apache.org" as mentioned in
the footer mail.

On Fri, Feb 9, 2024 at 2:32 PM Henning Blohm 
wrote:

> unsubscribe
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>

unsubscribe

2024-02-09 Thread Henning Blohm


unsubscribe


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Community over Code EU 2024 Travel Assistance Applications now open!

2024-02-03 Thread Gavin McDonald

Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

[no subject]

2024-02-03 Thread Gavin McDonald

Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

unsubscribe

2024-02-01 Thread Jakub Stransky

2024-01-31 Thread Shuyan Zhang

subscribe

Re: observer namenode and Router-based Federation

2024-01-26 Thread Ayush Saxena

RBF does support observer reads, it was added as part of
https://issues.apache.org/jira/browse/HDFS-16767

you need to go through it, there are different configs and stuff you might
need to setup to get RBF & Observer NN work together.

-Ayush

On Fri, 26 Jan 2024 at 13:44, 尉雁磊  wrote:

> Can't observer namenode and Router-based Federation be used together? I at
> the same time of using RBF
> configuration 
> org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider,but
> complains
>

observer namenode and Router-based Federation

2024-01-25 Thread 尉雁磊

Can't observer namenode and Router-based Federation be used together? I at the 
same time of using RBF configuration 
org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider,but 
complains

Unsubscribe

2024-01-23 Thread S B

Unsubscribe

refresh nodes denied

2024-01-20 Thread Dong Ye

Hi, Hadoop experts:

Do you know how to grant an account permission to run the "refreshNodes"
command? Thanks. Have a nice day.

The following is the exception:

2024-01-20 23:00:47,328 INFO
SecurityLogger.org.apache.hadoop.ipc.Server (Socket Reader #1 for port
8030): Auth successful for appattempt_1675084155030_1215916_01
(auth:SIMPLE)
2024-01-20 23:00:48,339 WARN
org.apache.hadoop.yarn.server.resourcemanager.AdminService (IPC Server
handler 0 on default port 8033): User hadoop doesn't have permission
to call 'refreshNodes'
2024-01-20 23:00:48,339 WARN
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger (IPC
Server handler 0 on default port 8033):
USER=hadoop IP=10.120.41.105OPERATION=refreshNodes  
TARGET=AdminService RESULT=FAILURE  DESCRIPTION=Unauthorized
userPERMISSIONS=
2024-01-20 23:00:48,340 INFO org.apache.hadoop.ipc.Server (IPC Server
handler 0 on default port 8033): IPC Server handler 0 on default port
8033, call Call#0 Retry#0
org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB.refreshNodes
from 10.120.41.105:55664
org.apache.hadoop.yarn.exceptions.YarnException:
org.apache.hadoop.security.AccessControlException: User hadoop doesn't
have permission to call 'refreshNodes'
at 
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAcls(AdminService.java:228)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshNodes(AdminService.java:450)
at 
org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshNodes(ResourceManagerAdministrationProtocolPBServiceImpl.java:144)
at 
org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:273)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:507)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2854)
Caused by: org.apache.hadoop.security.AccessControlException: User
hadoop doesn't have permission to call 'refreshNodes'
at 
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:414)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:379)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAccess(AdminService.java:221)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAcls(AdminService.java:226)
... 11 more

Jira account request

2024-01-18 Thread yarnfangwang

I have been involved in the secondary development of Hadoop for 2 years. 
The following article summarizes my experience in upgrading from Hadoop 2.7 to 
Hadoop 3.2 in-place at qihoo.360. 
https://zyun.360.cn/blog/?p=1458


Currently, I am eager  to actively participate in the community and contribute 
to the development of YARN code, such as Global Scheduler and YARN Router.


Best wishes!

ha setup, dfs.namenode.shared.edits.dir (hadoop 3.3.6)

2024-01-15 Thread Michael


Hi,

it seems there are two methods to configure this:

a) with qjournal://node1:8485,node2:8485/clusterID

b) with a mounted nfs (nas) folder


What is the prefered method?


Tanks for answering
 Michael


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: Data Remanence in HDFS

2024-01-13 Thread Jim Halfpenny

Hi Daniel,
In short you can’t create a HDFS block with unallocated data. You can create a 
zero length block, which will result in a zero byte file being created on the 
data node, but you can’t create a sparse file in HDFS. While HDFS has a block 
size e.g. 128MB if you create a small file then the file on the data node will 
be of a size directly proportional to the data and not the block length; 
creating a 32kB HDFS file will in turn create a single 32kB file on the 
datanodes. The way HDFS is built is not like a traditional file system with 
fixed size blocks/extents in fixed disk locations.

Kind regards,
Jim

> On 12 Jan 2024, at 18:35, Daniel Howard  wrote:
> 
> Thank Jim,
> 
> The scenario I have in mind is something like:
> 1) Ask HDFS to create a file that is 32k in length.
> 2) Attempt to read the contents of the file.
> 
> Can I even attempt to read the contents of a file that has not yet been 
> written? If so, what data would get sent?
> 
> For example, I asked a version of this question of ganeti with regard to 
> creating VMs. You can, by default, read the previous contents of the disk in 
> your new VM, but they have an option to wipe newly allocated VM disks for 
> added security.[1]
> 
> [1]: https://groups.google.com/g/ganeti/c/-c_KoLd6mnI
> 
> Thanks,
> -danny
> 
> On Fri, Jan 12, 2024 at 8:03 AM Jim Halfpenny  
> wrote:
>> Hi Danny,
>> This does depend on a number of circumstances, mostly based on file 
>> permissions. If for example a file is deleted without the -skipTrash option 
>> then it will be moved to the .Trash directory. From here it could be read, 
>> but the original file permissions will be preserved. Therefore if a user did 
>> not have read access before it was deleted then it won’t be able to read it 
>> from .Trash and if they did have read access then this ought to remain the 
>> case.
>> 
>> If a file is deleted then the blocks are marked for deletion by the namenode 
>> and won’t be available through HDFS, but there will be some lag between the 
>> HDFS delete operation and the block files being removed from the datanodes. 
>> It’s possible that someone could read the block from the datanode file 
>> system directly, but not through the HDFS file system. The blocks will exist 
>> on disk until the datanode itself deletes them.
>> 
>> The way HDFS works you won’t get previous data when you create a new block 
>> since unallocated spaces doesn’t exist in the same way as it does on a 
>> regular file system. Each HDFS block maps to a file on the datanodes and 
>> block files can be an arbitrary size, unlike the fixed block/extent size of 
>> a regular file system. You don’t “reuse" HDFS blocks, a block in HDFS is 
>> just a file on the data node. You could potentially recover data from 
>> unallocated space on the datanode disk the same way you would for any other 
>> deleted file.
>> 
>> If you want to remove the chance of data recovery on HDFS then encrypting 
>> the blocks using HDFS transparent encryption is the way to do it. They 
>> encryption keys reside in the namenode metadata so once they are deleted the 
>> data in that file is effectively lost. Beware of snapshots though since a 
>> deleted file in the live HDFS view may exist in a previous snapshot.
>> 
>> Kind regards,
>> Jim
>> 
>> 
>>> On 11 Jan 2024, at 21:50, Daniel Howard >> > wrote:
>>> 
>>> Is it possible for a user with HDFS access to read the contents of a file 
>>> previously deleted by a different user?
>>> 
>>> I know a user can employ KMS to encrypt files with a personal key, making 
>>> this sort of data leakage effectively impossible. But, without KMS, is it 
>>> possible to allocate a file with uninitialized data, and then read the data 
>>> that exists on the underlying disk?
>>> 
>>> Thanks,
>>> -danny
>>> 
>>> --
>>> http://dannyman.toldme.com 
> 
> 
> --
> http://dannyman.toldme.com

Re: Data Remanence in HDFS

2024-01-12 Thread Daniel Howard

Thank Jim,

The scenario I have in mind is something like:
1) Ask HDFS to create a file that is 32k in length.
2) Attempt to read the contents of the file.

Can I even attempt to read the contents of a file that has not yet been
written? If so, what data would get sent?

For example, I asked a version of this question of ganeti with regard to
creating VMs. You can, by default, read the previous contents of the disk
in your new VM, but they have an option to wipe newly allocated VM disks
for added security.[1]

[1]: https://groups.google.com/g/ganeti/c/-c_KoLd6mnI

Thanks,
-danny

On Fri, Jan 12, 2024 at 8:03 AM Jim Halfpenny 
wrote:

> Hi Danny,
> This does depend on a number of circumstances, mostly based on file
> permissions. If for example a file is deleted without the -skipTrash option
> then it will be moved to the .Trash directory. From here it could be read,
> but the original file permissions will be preserved. Therefore if a user
> did not have read access before it was deleted then it won’t be able to
> read it from .Trash and if they did have read access then this ought to
> remain the case.
>
> If a file is deleted then the blocks are marked for deletion by the
> namenode and won’t be available through HDFS, but there will be some lag
> between the HDFS delete operation and the block files being removed from
> the datanodes. It’s possible that someone could read the block from the
> datanode file system directly, but not through the HDFS file system. The
> blocks will exist on disk until the datanode itself deletes them.
>
> The way HDFS works you won’t get previous data when you create a new block
> since unallocated spaces doesn’t exist in the same way as it does on a
> regular file system. Each HDFS block maps to a file on the datanodes and
> block files can be an arbitrary size, unlike the fixed block/extent size of
> a regular file system. You don’t “reuse" HDFS blocks, a block in HDFS is
> just a file on the data node. You could potentially recover data from
> unallocated space on the datanode disk the same way you would for any other
> deleted file.
>
> If you want to remove the chance of data recovery on HDFS then encrypting
> the blocks using HDFS transparent encryption is the way to do it. They
> encryption keys reside in the namenode metadata so once they are deleted
> the data in that file is effectively lost. Beware of snapshots though since
> a deleted file in the live HDFS view may exist in a previous snapshot.
>
> Kind regards,
> Jim
>
>
> On 11 Jan 2024, at 21:50, Daniel Howard  wrote:
>
> Is it possible for a user with HDFS access to read the contents of a file
> previously deleted by a different user?
>
> I know a user can employ KMS to encrypt files with a personal key, making
> this sort of data leakage effectively impossible. But, without KMS, is it
> possible to allocate a file with uninitialized data, and then read the data
> that exists on the underlying disk?
>
> Thanks,
> -danny
>
> --
> http://dannyman.toldme.com
>
>
>

-- 
http://dannyman.toldme.com

Re: Data Remanence in HDFS

2024-01-12 Thread Jim Halfpenny

Hi Danny,
This does depend on a number of circumstances, mostly based on file 
permissions. If for example a file is deleted without the -skipTrash option 
then it will be moved to the .Trash directory. From here it could be read, but 
the original file permissions will be preserved. Therefore if a user did not 
have read access before it was deleted then it won’t be able to read it from 
.Trash and if they did have read access then this ought to remain the case.

If a file is deleted then the blocks are marked for deletion by the namenode 
and won’t be available through HDFS, but there will be some lag between the 
HDFS delete operation and the block files being removed from the datanodes. 
It’s possible that someone could read the block from the datanode file system 
directly, but not through the HDFS file system. The blocks will exist on disk 
until the datanode itself deletes them.

The way HDFS works you won’t get previous data when you create a new block 
since unallocated spaces doesn’t exist in the same way as it does on a regular 
file system. Each HDFS block maps to a file on the datanodes and block files 
can be an arbitrary size, unlike the fixed block/extent size of a regular file 
system. You don’t “reuse" HDFS blocks, a block in HDFS is just a file on the 
data node. You could potentially recover data from unallocated space on the 
datanode disk the same way you would for any other deleted file.

If you want to remove the chance of data recovery on HDFS then encrypting the 
blocks using HDFS transparent encryption is the way to do it. They encryption 
keys reside in the namenode metadata so once they are deleted the data in that 
file is effectively lost. Beware of snapshots though since a deleted file in 
the live HDFS view may exist in a previous snapshot.

Kind regards,
Jim

> On 11 Jan 2024, at 21:50, Daniel Howard  wrote:
> 
> Is it possible for a user with HDFS access to read the contents of a file 
> previously deleted by a different user?
> 
> I know a user can employ KMS to encrypt files with a personal key, making 
> this sort of data leakage effectively impossible. But, without KMS, is it 
> possible to allocate a file with uninitialized data, and then read the data 
> that exists on the underlying disk?
> 
> Thanks,
> -danny
> 
> --
> http://dannyman.toldme.com

Re: I don't want to set quotas through the router

2024-01-12 Thread Ayush Saxena

Hi,
Your question is not very clear. So, I am answering whatever I understand.

1. You don't want Router to manage Quotas?
Ans: Then you can use this config: dfs.federation.router.quota.enable
and set it to false

2. You have default NS as Router but you want to set Quota individually to NS?
Ans. Then use generic options in DFSAdmin

3. You want to set Quota on /path & it should set quota on NS1
/somePath & NS2 /somePath at same time?
Ans. You should explore mount entries with multiple
destinations(MultipleDestinationMountTableResolver), RBF supports
that, so if a path resolves to multiple destinations in different NS,
it would set the same quota on all the target destinations. if it is a
mount entry you need to go via DfsRouterAdmin else normal DfsAdmin
should do...

-Ayush

On Fri, 12 Jan 2024 at 12:16, 尉雁磊  wrote:
>
> Hello everyone, our cluster recently deployed router federation, because the 
> upper layer custom component depends on the way to set quotas and get quotas 
> without the router, so it does not want to set quotas through the router.
> After my test, hdfs dfadmin-setspacequota /path cannot be executed through 
> the router. I want to execute hdfs DFadmin-setspacequota /path through the 
> router at the same time in two clusters, which can achieve the desired 
> effect. This is the least change for us. Do you support the way I said
>

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Does dfsrouteradmin have a java api

2024-01-11 Thread 尉雁磊

Does dfsrouteradmin have a java api？

I don't want to set quotas through the router

2024-01-11 Thread 尉雁磊

Hello everyone, our cluster recently deployed router federation, because the 
upper layer custom component depends on the way to set quotas and get quotas 
without the router, so it does not want to set quotas through the router.
After my test, hdfs dfadmin-setspacequota /path cannot be executed through the 
router. I want to execute hdfs DFadmin-setspacequota /path through the router 
at the same time in two clusters, which can achieve the desired effect. This is 
the least change for us. Do you support the way I said

Data Remanence in HDFS

2024-01-11 Thread Daniel Howard

Is it possible for a user with HDFS access to read the contents of a file
previously deleted by a different user?

I know a user can employ KMS to encrypt files with a personal key, making
this sort of data leakage effectively impossible. But, without KMS, is it
possible to allocate a file with uninitialized data, and then read the data
that exists on the underlying disk?

Thanks,
-danny

-- 
http://dannyman.toldme.com

Unsubscribe me

2024-01-06 Thread Man-Young Goo

Manyoung Goo E-mail : my...@nate.comTel : +82-2-360-1590
-- Original Message --
Date: Wednesday, Jan 3, 2024 04:29:19 PM
From: "Ramkumar B" 
To: ,  ,  
Subject: Unsubscribe

unsubscribe me

2024-01-06 Thread Man-Young Goo

unsubscribe meManyoung Goo E-mail : my...@nate.comTel : +82-2-360-1590
-- Original Message --
Date: Wednesday, Oct 4, 2023 01:34:02 PM
From: "Kiyoshi Mizumaru" 
To: "Harry Jamison" 
Cc: "Hadoop Users" 
Subject: Re: HDFS HA standby

Why don't you try to change the logging level? DEBUG or TRACE would be helpful.On Wed, Oct 4, 2023 at 1:13 PM Harry Jamison  wrote:I am not sure exactly what the problem is now.My namenode (and I think journal node are getting shut down.Is there a way to tell Why it is getting the shutdown signal?Also the datanode seems to be getting this errorEnd of File Exception between local host isHere are the logs, and I only see INFO logging, and then a the Shutdown[2023-10-03 20:53:00,873] INFO Initializing quota with 12 thread(s) (org.apache.hadoop.hdfs.server.namenode.FSDirectory)[2023-10-03 20:53:00,876] INFO Quota initialization completed in 1 millisecondsname space=2storage space=0storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 (org.apache.hadoop.hdfs.server.namenode.FSDirectory)[2023-10-03 20:53:00,882] INFO Total number of blocks            = 0 (org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)[2023-10-03 20:53:00,884] INFO Starting CacheReplicationMonitor with interval 3 milliseconds (org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor)[2023-10-03 20:53:00,884] INFO Number of invalid blocks          = 0 (org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)[2023-10-03 20:53:00,884] INFO Number of under-replicated blocks = 0 (org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)[2023-10-03 20:53:00,884] INFO Number of  over-replicated blocks = 0 (org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)[2023-10-03 20:53:00,884] INFO Number of blocks being written    = 0 (org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)[2023-10-03 20:53:00,884] INFO STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 67 msec (org.apache.hadoop.hdfs.StateChange)[2023-10-03 20:54:16,453] ERROR RECEIVED SIGNAL 15: SIGTERM (org.apache.hadoop.hdfs.server.namenode.NameNode)[2023-10-03 20:54:16,467] INFO SHUTDOWN_MSG: /-***SHUTDOWN_MSG: Shutting down NameNode at vmnode1/192.168.1.159*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-/ (org.apache.hadoop.hdfs.server.namenode.NameNode)When I start the data node I see this[2023-10-03 20:53:00,882] INFO Namenode Block pool BP-1620264838-192.168.1.159-1696370857417 (Datanode Uuid 66068658-b08b-49cd-aba0-56ac1f29e7d5) service to vmnode1/192.168.1.159:8020 trying to claim ACTIVE state with txid=15 (org.apache.hadoop.hdfs.server.datanode.DataNode)[2023-10-03 20:53:00,882] INFO Acknowledging ACTIVE Namenode Block pool BP-1620264838-192.168.1.159-1696370857417 (Datanode Uuid 66068658-b08b-49cd-aba0-56ac1f29e7d5) service to vmnode1/192.168.1.159:8020 (org.apache.hadoop.hdfs.server.datanode.DataNode)[2023-10-03 20:53:00,882] INFO After receiving heartbeat response, updating state of namenode vmnode1:8020 to active (org.apache.hadoop.hdfs.server.datanode.DataNode)[2023-10-03 20:54:18,771] WARN IOException in offerService (org.apache.hadoop.hdfs.server.datanode.DataNode)java.io.EOFException: End of File Exception between local host is: "vmnode1/192.168.1.159"; destination host is: "vmnode1":8020; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:930)	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:879)	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1571)	at org.apache.hadoop.ipc.Client.call(Client.java:1513)	at org.apache.hadoop.ipc.Client.call(Client.java:1410)	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)	at com.sun.proxy.$Proxy19.sendHeartbeat(Unknown Source)	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:168)	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:562)	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:710)	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)	at java.base/java.lang.Thread.run(Thread.java:829)Caused by: java.io.EOFException	at

Hi Michal,

Thanks for your detailed reply, it was very helpful. The motivation for 
replacing Kafka Connect is mostly related to having to run backfills from 
time-to-time - we store all the raw data from Kafka Connect, extract the fields 
we're currently using, then drop the extracted data and keep the raw JSON, and 
that's fine in the case that backfilling is never needed, but when it becomes 
necessary, processing 90+ days of JSON at 12+ billion rows per day using Hive 
LLAP is excruciatingly slow. Therefore we wanted to have the data in ORC format 
as early as possible instead of adding an intermediate job to transform the 
JSON to ORC in the current pipeline. Changing this part of the pipeline over 
should also result in an overall reduction of resources used - nothing crazy 
for this first topic that we're changing over but if it goes well, we have a 
few Kafka Connect clusters that we would be interested in converting, and that 
would also free up a ton of CPU time in Hive.

Thanks,
Aaron


On Thu, 2023-12-07 at 13:32 +0100, Michal Klempa wrote:
Hi Aaron,
I do not know Gobblin, so no advice there.

You write that currently Kafka Connect dumps to files, as you probably already 
know, Kafka Connect can't do the aggregation.
To my knowledge, NiFi is also ETL with local Transformation, there is no state 
maintenance on a global scale. You can write processors to do stateful 
transformation, but for this task, would be tedious in my opinion. I would put 
NiFi out of the game.

Now to the requirements, I assume, you have:
- large volume (~1k events per second)
- small messages (<1k JSONs)
- need to have data in near real-time (seconds at most) after a window 
(aggregation) is triggered for data stake holders to query


Then, it makes sense to think of doing the aggregation on-the-fly, in a 
real-time framework, i.e. real-time ETL non-stop running job.
If your use-case is not satisfying the criteria, e.g low volume, or no 
real-time need (1 to 5 minutes lateness is fine), I would strongly encourage to 
avoid using real-time stateful streaming, as it is complicated to setup, scale, 
maintain, run and mostly: code bug-free. It is a non-stop running application, 
any memory leak > you have restart on OOM every couple of hours. It is hard.

You may have:
- high volume + no real-time (5 minutes lateness is fine)

In that case, running any pyspark every 5 minutes with ad-hoc AWS spot 
instances cluster with batch job is just fine.

You may have:
- low volume + no real-time (5 minutes lateness is fine)
In that case, just run plain 1 instance python script doing the job, 1k to 100k 
events you can just consume from Kafka directly, pack into ORC, and dump on 
S3/HDFS on a single CPU. Use any cron to run it every 5 minutes. Done.


In case your use case is:
- large volume + real-time
for this, Flink and Spark Structured Streaming are both good fit, but there is 
also a thing called Kafka Streams, I would suggest to add this as a competitor. 
Also there is Beam(Google Dataflow), if you are on GCP already. All of them do 
the same job.


Flink vs. Spark Structured Streaming vs. Kafka Streams:
Deployement: Kafka Streams is just one fat-jar, Flink+Spark - you need to 
maintain clusters, but both frameworks are working on being k8s native, but... 
not easy to setup either.
Coding: everything is JVM, Spark has python, Flink added python too. Seems 
there are some python attempts on Kafka Streams approach, no experience though.
Fault Tolerance: I have real-world experience with Flink+Spark Structured 
Streaming, both can restart from checkpoints, Flink have also savepoints which 
is a good feature to start a new job after modifications (but also not easy to 
setup).
Automatically Scalable: I think none of the open-source has this feature 
out-of-the-box (correct me if wrong). You may want to pay Ververica platform 
(Flink authors offering), Databricks (Spark authors offering), there must be 
something from Confluent or competitors,too. Google of course has its Dataflow 
(Beam API). All auto-scaling is pain however, each rescale means reshuffle of 
the data.
Exactly once: To my knowledge, only Flink nowadays offers end-to-end exactly 
once and I would not be sure whether that can be achieved with ORC on HDFS as 
destination. Maybe idempontent ORC writer can be used or some other form of 
"transaction" on the destination must exist.

All in all, if I would be solving your problem, I would first attack the 
requirements list. Whether it can't be done easier. If not, Flink would be my 
choice as I had good experience with it and you can really hack anything 
inside. But prepare yourself, that the requirement list is hard, even if you 
get pipeline up in 2 weeks, you surely will re-iterate the decision 
after some incidents in next 6 months.
If you loosen requirements a bit, it becomes easier and easier. Your current 
solution sounds very reasonable to me. You picked something that works out of 
the box (Kafka Connect) and

Re: JSON in Kafka -> ORC in HDFS - Thoughts on different tools?

2023-12-07 Thread Michal Klempa

Hi Aaron,
I do not know Gobblin, so no advice there.

You write that currently Kafka Connect dumps to files, as you probably
already know, Kafka Connect can't do the aggregation.
To my knowledge, NiFi is also ETL with local Transformation, there is no
state maintenance on a global scale. You can write processors to do
stateful transformation, but for this task, would be tedious in my opinion.
I would put NiFi out of the game.

Now to the requirements, I assume, you have:
- large volume (~1k events per second)
- small messages (<1k JSONs)
- need to have data in near real-time (seconds at most) after a window
(aggregation) is triggered for data stake holders to query

Then, it makes sense to think of doing the aggregation on-the-fly, in a
real-time framework, i.e. real-time ETL non-stop running job.
If your use-case is not satisfying the criteria, e.g low volume, or no
real-time need (1 to 5 minutes lateness is fine), I would strongly
encourage to avoid using real-time stateful streaming, as it is complicated
to setup, scale, maintain, run and mostly: code bug-free. It is a non-stop
running application, any memory leak > you have restart on OOM every couple
of hours. It is hard.

You may have:
- high volume + no real-time (5 minutes lateness is fine)
In that case, running any pyspark every 5 minutes with ad-hoc AWS spot
instances cluster with batch job is just fine.

You may have:
- low volume + no real-time (5 minutes lateness is fine)
In that case, just run plain 1 instance python script doing the job, 1k to
100k events you can just consume from Kafka directly, pack into ORC, and
dump on S3/HDFS on a single CPU. Use any cron to run it every 5 minutes.
Done.

In case your use case is:
- large volume + real-time
for this, Flink and Spark Structured Streaming are both good fit, but there
is also a thing called Kafka Streams, I would suggest to add this as a
competitor. Also there is Beam(Google Dataflow), if you are on GCP already.
All of them do the same job.

Flink vs. Spark Structured Streaming vs. Kafka Streams:
Deployement: Kafka Streams is just one fat-jar, Flink+Spark - you need to
maintain clusters, but both frameworks are working on being k8s native,
but... not easy to setup either.
Coding: everything is JVM, Spark has python, Flink added python too. Seems
there are some python attempts on Kafka Streams approach, no experience
though.
Fault Tolerance: I have real-world experience with Flink+Spark Structured
Streaming, both can restart from checkpoints, Flink have also savepoints
which is a good feature to start a new job after modifications (but also
not easy to setup).
Automatically Scalable: I think none of the open-source has this feature
out-of-the-box (correct me if wrong). You may want to pay Ververica
platform (Flink authors offering), Databricks (Spark authors offering),
there must be something from Confluent or competitors,too. Google of course
has its Dataflow (Beam API). All auto-scaling is pain however, each rescale
means reshuffle of the data.
Exactly once: To my knowledge, only Flink nowadays offers end-to-end
exactly once and I would not be sure whether that can be achieved with ORC
on HDFS as destination. Maybe idempontent ORC writer can be used or some
other form of "transaction" on the destination must exist.

All in all, if I would be solving your problem, I would first attack the
requirements list. Whether it can't be done easier. If not, Flink would be
my choice as I had good experience with it and you can really hack anything
inside. But prepare yourself, that the requirement list is hard, even if
you get pipeline up in 2 weeks, you surely will re-iterate the
decision after some incidents in next 6 months.
If you loosen requirements a bit, it becomes easier and easier. Your
current solution sounds very reasonable to me. You picked something that
works out of the box (Kafka Connect) and done ELT, where something, that
can aggregate out of the box (Hive) does it. Why exactly you need to
replace it?

Good luck, M.

On Fri, Dec 1, 2023 at 11:38 AM Aaron Grubb  wrote:

> Hi all,
>
> Posting this here to avoid biases from the individual mailing lists on why
> the product they're using is the best. I'm analyzing tools to
> replace a section of our pipeline with something more efficient. Currently
> we're using Kafka Connect to take data from Kafka and put it into
> S3 (not HDFS cause the connector is paid) in JSON format, then Hive reads
> JSON from S3 and creates ORC files in HDFS after a group by. I
> would like to replace this with something that reads Kafka, applies
> aggregations and windowing in-place and writes HDFS directly. I know that
> the impending Hive 4 release will support this but Hive LLAP is *very*
> slow when processing JSON. So far I have a working PySpark application
> that accomplishes this replacement using structured streaming + windowing,
> however the decision to evaluate Spark was based on there being a
> use for Spark in other areas, so I'm interested in

JSON in Kafka -> ORC in HDFS - Thoughts on different tools?

2023-12-01 Thread Aaron Grubb

Hi all,

Posting this here to avoid biases from the individual mailing lists on why the 
product they're using is the best. I'm analyzing tools to
replace a section of our pipeline with something more efficient. Currently 
we're using Kafka Connect to take data from Kafka and put it into
S3 (not HDFS cause the connector is paid) in JSON format, then Hive reads JSON 
from S3 and creates ORC files in HDFS after a group by. I
would like to replace this with something that reads Kafka, applies 
aggregations and windowing in-place and writes HDFS directly. I know that
the impending Hive 4 release will support this but Hive LLAP is *very* slow 
when processing JSON. So far I have a working PySpark application
that accomplishes this replacement using structured streaming + windowing, 
however the decision to evaluate Spark was based on there being a
use for Spark in other areas, so I'm interested in getting opinions on other 
tools that may be better for this use case based on resource
usage, ease of use, scalability, resilience, etc.

In terms of absolute must-haves:
- Read JSON from Kafka
- Ability to summarize data over a window
- Write ORC to HDFS
- Fault tolerant (can tolerate anywhere from a single machine to an entire 
cluster going offline unexpectedly while maintaining exactly-once
guarantees)

Nice-to-haves:
- Automatically scalable, both up and down (doesn't matter if standalone, YARN, 
Kubernetes, etc)
- Coding in Java not required

So far I've found that Gobblin, Flink and NiFi seem capable of accomplishing 
what I'm looking for, but neither I nor anyone at my company has
any experience with those products, so I was hoping to get some opinions on 
what the users here would choose and why. I'm also open to other
tools that I'm not yet aware of.

Thanks for your time,
Aaron



-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

HDFS native client seems doesn't free stale exception string

2023-12-01 Thread Walter Mitty

Hello,

When I read the codes of libhdfs, I found that the last exception string
was not cleaned before calling functions.

In more detail, an exception will be saved in `ThreadLocalState` by
`setTLSExceptionStrings` if it throws in function calling, then the
subsequent calling of `hdfsGetLastExceptionRootCause` will return the saved
exception strings. But the problem is even if the subsequent calling of
other functions, eg `hdfsExists`, returns success, the `
hdfsGetLastExceptionRootCause` still returns the former exception strings.

The related code is in:
https://github.com/apache/hadoop/blob/5cda162a804fb0cfc2a5ac0058ab407662c5fb00/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c#L795-L809

Does anyone know if this behavior is expected?

Thanks

Re: INFRA-25203

2023-11-27 Thread Peter Boot

unsubscribe

On Mon, 27 Nov 2023, 11:26 pm Drew Foulks,  wrote:

> Redirect test.
>
> --
> Cheers,
>
> Drew Foulks
>  ASF Infra
>
>
>

Re: Details about cluster balancing

2023-11-27 Thread Akash Jain

Thanks Ayush!

> On 15-Nov-2023, at 10:59 PM, Ayush Saxena  wrote:
> 
> Hi Akash,
> You can read about balancer here:
> https://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
> HADOOP-1652(https://issues.apache.org/jira/browse/HADOOP-1652) has
> some details around it as well, it has some docs attached to it, you
> can read them...
> For the code, you can explore something over here:
> https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L473-L479
> 
> -Ayush
> 
> On Sun, 5 Nov 2023 at 22:33, Akash Jain  wrote:
>> 
>> Hello,
>> 
>> For my project, I am analyzing an algorithm to balance the disk usage across 
>> thousands of storage nodes across different availability zones.
>> 
>> Let’s say
>> Availability zone 1
>> Disk usage for data of customer 1 is 70%
>> Disk usage for data of customer 2 is 10%
>> 
>> Availability zone 2
>> Disk usage for data of customer 1 is 30%
>> Disk usage for data of customer 2 is 90%
>> 
>> and so forth…
>> 
>> Clearly in above example customer 1 data has much higher data locality in 
>> AZ1 compared to AZ2. Similarly for customer 2 data it is more data locality 
>> in AZ1 compared to AZ1
>> 
>> In an ideal world, the data of the customers would look something like this
>> 
>> 
>> Availability zone 1
>> Disk usage for data of customer 1 is 50%
>> Disk usage for data of customer 2 is 50%
>> 
>> Availability zone 2
>> Disk usage for data of customer 1 is 50%
>> Disk usage for data of customer 2 is 50%
>> 
>> 
>> HDFS Balancer looks related, however I have some questions:
>> 
>> 1. Why does the algorithm tries to pair an over utilized node with under 
>> utilized instead of every node holding average data?
>> (https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/data-storage/content/step_2__storage_group_pairing.html)
>> 
>> 2. Where can I find more algorithmic details of how the pairing happens?
>> 
>> 3. Is this the only balancing algorithm supported by HDFS?
>> 
>> Thanks
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

INFRA-25203

2023-11-27 Thread Drew Foulks

redirect test

-- 
Cheers,

Drew Foulks
 ASF Infra

INFRA-25203

2023-11-27 Thread Drew Foulks

Redirect test.

-- 
Cheers,

Drew Foulks
 ASF Infra

INFRA-25203

2023-11-27 Thread Drew Foulks

testing redirect

-- 
Cheers,

Drew Foulks
 ASF Infra

Query Regarding Input Split

2023-11-22 Thread Pallavi Balwani

Lets say we are performing map reduce on a file whose size is 130 MB. And
we are using text input format to process it line by line. Now if that file
contains only one line .
So theoretically it is divided into 2 input splits. And there will be 2
mappers.But actually there has to be only 1 input split.
So how does map reduce decide that ?

cgroup v2 support for Yarn

2023-11-16 Thread Yan Yan

Hello,

As operating systems (e.g. AL2023) start to migrate from cgroup v1 to
cgroup v2, I wonder if there's any plan to support cgroup v2 in Yarn.
Specifically, I noticed that in Yarn, relevant classes like CGroupsHandler
 and CGroupsHandlerImpl has assumptions based on cgroup v1 filesystem
hierarchy (e.g. /sys/fs/cgroup// ) which is no longer
true in v2 (changed to something like /sys/fs/cgroup//)
per https://www.man7.org/linux/man-pages/man7/cgroups.7.html among other
breaking changes. And this is impacting GPU plugin bootstrap and limiting
CPU usage with CgroupsLCEResourcesHandler.

Do people workaround this problem by mounting a v1 cgroup in a different
location? Is there any plan to officially support cgroup v2 (did some
research and couldn't find any)? Please let me know if I missed anything,
or I can create a Jira ticket to track this effort if needed.

Thank you!
Yan

Re: Details about cluster balancing

2023-11-15 Thread Ayush Saxena

Hi Akash,
You can read about balancer here:
https://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
HADOOP-1652(https://issues.apache.org/jira/browse/HADOOP-1652) has
some details around it as well, it has some docs attached to it, you
can read them...
For the code, you can explore something over here:
https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L473-L479

-Ayush

On Sun, 5 Nov 2023 at 22:33, Akash Jain  wrote:
>
> Hello,
>
> For my project, I am analyzing an algorithm to balance the disk usage across 
> thousands of storage nodes across different availability zones.
>
> Let’s say
> Availability zone 1
> Disk usage for data of customer 1 is 70%
> Disk usage for data of customer 2 is 10%
>
> Availability zone 2
> Disk usage for data of customer 1 is 30%
> Disk usage for data of customer 2 is 90%
>
> and so forth…
>
> Clearly in above example customer 1 data has much higher data locality in AZ1 
> compared to AZ2. Similarly for customer 2 data it is more data locality in 
> AZ1 compared to AZ1
>
> In an ideal world, the data of the customers would look something like this
>
>
> Availability zone 1
> Disk usage for data of customer 1 is 50%
> Disk usage for data of customer 2 is 50%
>
> Availability zone 2
> Disk usage for data of customer 1 is 50%
> Disk usage for data of customer 2 is 50%
>
>
> HDFS Balancer looks related, however I have some questions:
>
> 1. Why does the algorithm tries to pair an over utilized node with under 
> utilized instead of every node holding average data?
> (https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/data-storage/content/step_2__storage_group_pairing.html)
>
> 2. Where can I find more algorithmic details of how the pairing happens?
>
> 3. Is this the only balancing algorithm supported by HDFS?
>
> Thanks

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

CVE-2023-26031: Privilege escalation in Apache Haoop Yarn container-executor binary on Linux systems

2023-11-15 Thread Masatake Iwasaki

Severity: critical

Affected versions:

- Apache Hadoop 3.3.1 before 3.3.5

Description:

Relative library resolution in linux container-executor binary in
Apache Hadoop 3.3.1-3.3.4 on Linux allows local user to gain root
privileges. If the YARN cluster is accepting work from remote
(authenticated) users, this MAY permit remote users to gain root
privileges.

Hadoop 3.3.0 updated the " YARN Secure Containers
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/SecureContainer.html
" to add a feature for executing user-submitted applications in
isolated linux containers.

The native binary HADOOP_HOME/bin/container-executor is used to launch
these containers; it must be owned by root and have the suid bit set
in order for the YARN processes to run the containers as the specific
users submitting the jobs.

The patch " YARN-10495
https://issues.apache.org/jira/browse/YARN-10495 . make the rpath of
container-executor configurable" modified the library loading path for
loading .so files from "$ORIGIN/" to ""$ORIGIN/:../lib/native/". This
is the a path through which libcrypto.so is located. Thus it is is
possible for a user with reduced privileges to install a malicious
libcrypto library into a path to which they have write access, invoke
the container-executor command, and have their modified library
executed as root.
If the YARN cluster is accepting work from remote (authenticated)
users, and these users' submitted job are executed in the physical
host, rather than a container, then the CVE permits remote users to
gain root privileges.

The fix for the vulnerability is to revert the change, which is done
in  YARN-11441 https://issues.apache.org/jira/browse/YARN-11441 ,
"Revert YARN-10495". This patch is in hadoop-3.3.5.

To determine whether a version of container-executor is vulnerable,
use the readelf command. If the RUNPATH or RPATH value contains the
relative path "./lib/native/" then it  is at risk

$ readelf -d container-executor|grep 'RUNPATH\|RPATH'
0x001d (RUNPATH)Library runpath:
[$ORIGIN/:../lib/native/]

If it does not, then it is safe:

$ readelf -d container-executor|grep 'RUNPATH\|RPATH'
0x001d (RUNPATH)Library runpath: [$ORIGIN/]

For an at-risk version of container-executor to enable privilege
escalation, the owner must be root and the suid bit must be set

$ ls -laF /opt/hadoop/bin/container-executor
---Sr-s---. 1 root hadoop 802968 May 9 20:21 /opt/hadoop/bin/container-executor

A safe installation lacks the suid bit; ideally is also not owned by root.

$ ls -laF /opt/hadoop/bin/container-executor
-rwxr-xr-x. 1 yarn hadoop 802968 May 9 20:21 /opt/hadoop/bin/container-executor

This configuration does not support Yarn Secure Containers, but all
other hadoop services, including YARN job execution outside secure
containers continue to work.

This issue is being tracked as YARN-11441

Required Configurations:

The owner of the container-executor binary must be set to "root" and
suid set bit such that callers would execute the binary as root. These
operations are a requirement for "YARN Secure Containers".

In an installation using the hadoop.tar.gz file the binary's owner is
that of the installing user, and without the suid permission is not at
risk.

However, Apache BIgtop installations set the owner and permissions
such that installations may be vulnerable

The container-executor binary is only vulnerable on some Hadoop/Bigtop
releases. It is possible to verify whether a version is vulnerable
using the readelf command.

Work Arounds:

*  Upgrade to Apache Hadoop 3.3.5
*  If Yarn Secure Containers are not required, remove all execute
permissions on bin/container-executor ; change its owner from root, or
simply delete it.
*  If Yarn Secure Containers are required on a vulnerable release and
upgrade is not possible, replace the container-executor binary with
that of the 3.3.5 release.

As most Hadoop installations do not use Yarn Secure Containers,
removing execute permissions from the container-executor binary a is
sufficient to secure the systems; deletion ensures that no security
scanners will report the issue.

Credit:

Esa Hiltunen (finder)
Mikko Kortelainen (finder)
The Teragrep Project (sponsor)

References:

https://issues.apache.org/jira/browse/YARN-11441
https://hadoop.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-26031

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Details about cluster balancing

2023-11-05 Thread Akash Jain

Hello,

For my project, I am analyzing an algorithm to balance the disk usage across 
thousands of storage nodes across different availability zones.

Let’s say 
Availability zone 1
Disk usage for data of customer 1 is 70%
Disk usage for data of customer 2 is 10%

Availability zone 2
Disk usage for data of customer 1 is 30%
Disk usage for data of customer 2 is 90%

and so forth…

Clearly in above example customer 1 data has much higher data locality in AZ1 
compared to AZ2. Similarly for customer 2 data it is more data locality in AZ1 
compared to AZ1

In an ideal world, the data of the customers would look something like this 


Availability zone 1
Disk usage for data of customer 1 is 50%
Disk usage for data of customer 2 is 50%

Availability zone 2
Disk usage for data of customer 1 is 50%
Disk usage for data of customer 2 is 50%


HDFS Balancer looks related, however I have some questions:

1. Why does the algorithm tries to pair an over utilized node with under 
utilized instead of every node holding average data?
(https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/data-storage/content/step_2__storage_group_pairing.html)

2. Where can I find more algorithmic details of how the pairing happens?

3. Is this the only balancing algorithm supported by HDFS?

Thanks

Re: 关于hadoop-3.3.1版本使用libhdfs3.so 访问hdfs联邦模式router节点rpc端口有20分钟延时问题

2023-10-30 Thread Xiaoqiao He

Add hdfs-dev@h.a.o and user@h.a.o

On Thu, Oct 26, 2023 at 7:07 PM 王继泽  wrote:

> 最近在使用hadoop的过程中，发现了一个情况。
> 当我使用c
> api向hdfs联邦模式router节点rpc端口发送请求时，比如说写文件，客户端发送完成请求后，hadoop端需要20分钟延时文件才有字节大小，延时期间不能对文件进行操作。
>
> 客户端这边运行结束之后，hadoop端日志大致过程：
> 1.namenode接收到客户端的请求，FSEditLog打印日志。
> 2.blockmanager.BlockPlacementPolicy: 提示没有足够的副本可供选择.
> Reason:{NO_REQUIRED_STORAGE_TYPE=1}
> 3.StateChange: 分配block
> 4.StateChange: 为hadoop目录文件获取租约
> 5.ipc.Server: 检查租约的方法抛了个异常 LeaseExpiredExcepion： INode is not a regular
> file: /
> 6.(开始等待)
> 7.20分钟后，达到hard limit最大值限制。强制关闭租约。
> 8.触发 Lease recovery
> 9.然后才可执行成功。
>
> 我也怀疑过是客户端的问题。但是我做了几组测试，(都是用c api向hadoop发送写请求，以下简写)
> 3.3.1版本、router、rpc端口。  --> 有20分钟延时
> 3.3.1版本、namenode、rpc端口。 --> 无问题
> 3.3.1版本、router、http端口。 --> 无问题
> 3.3.1版本、namenode、http端口。--> 无问题
>
> 3.1.1版本、router、rpc端口。 --> 无问题
> 3.1.1版本、namenode、rpc端口。 --> 无问题
> 3.1.1版本、router、rpc端口。 --> 无问题
> 3.1.1版本、namenode、rpc端口。 --> 无问题
>
> 以下是我的猜测：
> 从hadoop日志中看，猜测是3.3.1版本、router、rpc端口一开始未获取到租约，所以导致无法正常关闭租约，直到hard
> limit触发，才能退出。但是我无法解释为什么相同的客户端，3.1.1版本就无该现象。我怀疑是版本的变化改动与libhdfs3.so的某个部分不适配导致这一现象。
>
>
> 如果有人发现类似的情况，我希望得到回复，为我指明这个问题的方向。，
>
>
>
> | |
> 王继泽
> |
> |
> y98d...@163.com
> |
>
>

Re: 关于hadoop-3.3.1版本使用libhdfs3.so 访问hdfs联邦模式router节点rpc端口有20分钟延时问题

2023-10-30 Thread Xiaoqiao He

Add hdfs-dev@h.a.o and user@h.a.o

On Thu, Oct 26, 2023 at 7:07 PM 王继泽  wrote:

> 最近在使用hadoop的过程中，发现了一个情况。
> 当我使用c
> api向hdfs联邦模式router节点rpc端口发送请求时，比如说写文件，客户端发送完成请求后，hadoop端需要20分钟延时文件才有字节大小，延时期间不能对文件进行操作。
>
> 客户端这边运行结束之后，hadoop端日志大致过程：
> 1.namenode接收到客户端的请求，FSEditLog打印日志。
> 2.blockmanager.BlockPlacementPolicy: 提示没有足够的副本可供选择.
> Reason:{NO_REQUIRED_STORAGE_TYPE=1}
> 3.StateChange: 分配block
> 4.StateChange: 为hadoop目录文件获取租约
> 5.ipc.Server: 检查租约的方法抛了个异常 LeaseExpiredExcepion： INode is not a regular
> file: /
> 6.(开始等待)
> 7.20分钟后，达到hard limit最大值限制。强制关闭租约。
> 8.触发 Lease recovery
> 9.然后才可执行成功。
>
> 我也怀疑过是客户端的问题。但是我做了几组测试，(都是用c api向hadoop发送写请求，以下简写)
> 3.3.1版本、router、rpc端口。  --> 有20分钟延时
> 3.3.1版本、namenode、rpc端口。 --> 无问题
> 3.3.1版本、router、http端口。 --> 无问题
> 3.3.1版本、namenode、http端口。--> 无问题
>
> 3.1.1版本、router、rpc端口。 --> 无问题
> 3.1.1版本、namenode、rpc端口。 --> 无问题
> 3.1.1版本、router、rpc端口。 --> 无问题
> 3.1.1版本、namenode、rpc端口。 --> 无问题
>
> 以下是我的猜测：
> 从hadoop日志中看，猜测是3.3.1版本、router、rpc端口一开始未获取到租约，所以导致无法正常关闭租约，直到hard
> limit触发，才能退出。但是我无法解释为什么相同的客户端，3.1.1版本就无该现象。我怀疑是版本的变化改动与libhdfs3.so的某个部分不适配导致这一现象。
>
>
> 如果有人发现类似的情况，我希望得到回复，为我指明这个问题的方向。，
>
>
>
> | |
> 王继泽
> |
> |
> y98d...@163.com
> |
>
>

Regarding the 20-minute delay problem when using libhdfs3.so in hadoop-3.3.1 version to access the hdfs federation mode router node rpc port

2023-10-26 Thread 王继泽

Recently, I discovered a situation while using hadoop.
When I use C API to send a request to the HDFS federation mode router node RPC 
port, such as writing a file, after the client sends the completed request, the 
Hadoop side needs a 20-minute delay before the file has a byte size, and the 
file cannot be processed during the delay. operate.

After the client side is running, the general process of the hadoop side log is 
as follows:
1. The namenode receives the client's request, and FSEditLog prints the log.
2.blockmanager.BlockPlacementPolicy: Prompts that there are not enough copies 
to choose from. Reason: {NO_REQUIRED_STORAGE_TYPE=1}
3.StateChange: allocate block
4.StateChange: Obtain lease for hadoop directory files
5.ipc.Server: The method of checking the lease threw an exception 
LeaseExpiredExcepion: INode is not a regular file: /
6.(Start waiting)
7. After 20 minutes, the hard limit is reached. Force closing of lease.
8. Trigger Lease recovery
9. Then the execution can be successful.

I also suspected it was a client problem. But I did several sets of tests (all 
using C API to send write requests to Hadoop, abbreviated below)
Version 3.3.1, router, rpc port. --> There is a 20-minute delay
Version 3.3.1, namenode, rpc port. --> No problem
Version 3.3.1, router, http port. --> No problem
Version 3.3.1, namenode, http port. --> No problem

Version 3.1.1, router, rpc port. --> No problem
Version 3.1.1, namenode, rpc port. --> No problem
Version 3.1.1, router, rpc port. --> No problem
Version 3.1.1, namenode, rpc port. --> No problem

Here are my guesses:
From the hadoop log, it is speculated that version 3.3.1, router, and rpc port 
did not obtain the lease at the beginning, so the lease could not be closed 
normally, and the lease could not be exited until the hard limit was triggered. 
But I can't explain why the same client does not have this phenomenon in 
version 3.1.1. I suspect that this phenomenon is caused by the incompatibility 
between version changes and certain parts of libhdfs3.so.

If anyone finds a similar situation, I'd love a reply to point me in the 
direction of this issue.



| |
王继泽
|
|
y98d...@163.com
|

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 65428 matches

Mail list logo