Re: Agenda & More Information about Hadoop Community Meetup @ Palo Alto, June 26

2019-06-25 Thread Weiwei Yang
Thanks Wangda.
Will this event be recorded? It will be extremely helpful for people who are 
unable to join to catch up.

Thanks
Weiwei
On Jun 26, 2019, 4:12 AM +0800, Wangda Tan , wrote:
A friendly reminder,

The meetup will take place tomorrow at 9:00 AM PDT to 4:00 PM PDT.

The address is: 395 Page Mill Rd, Palo Alto, CA 94306
We’ll be in the Bigtop conference room on the 1st floor. Go left after
coming through the main entrance, and it will be on the right.

Zoom: https://cloudera.zoom.us/j/606607666

Please let me know if you have any questions. If you haven't RSVP yet,
please go ahead and RSVP so we can better prepare food, seat, etc.

Thanks,
Wangda

On Wed, Jun 19, 2019 at 4:49 PM Wangda Tan  wrote:

Hi All,

I want to let you know that we have confirmed most of the agenda for
Hadoop Community Meetup. It will be a whole day event.

Agenda & Dial-In info because see below, *please RSVP
at https://www.meetup.com/Hadoop-Contributors/events/262055924/
*

Huge thanks to Daniel Templeton, Wei-Chiu Chuang, Christina Vu for helping
with organizing and logistics.

*Please help to promote meetup information on Twitter, LinkedIn, etc.
Appreciated! *

Best,
Wangda

























































*AM:9:00: Arrival and check-in--9:30 -
10:15:-Talk: Hadoop storage in cloud-native
environmentsAbstract: Hadoop is a mature storage system but designed years
before the cloud-native movement. Kubernetes and other cloud-native tools
are emerging solutions for containerized environments but sometimes they
require different approaches.In this presentation we would like to share
our experiences to run Apache Hadoop Ozone in Kubernetes and the connection
point to other cloud-native ecosystem elements. We will compare the
benefits and drawbacks to use Kubernetes and Hadoop storage together and
show our current achievements and future plans.Speaker: Marton Elek
(Cloudera)10:20 - 11:00:--Talk: Selective Wire Encryption In
HDFSAbstract: Wire data encryption is a key component of the Hadoop
Distributed File System (HDFS). However, such encryption enforcement comes
in as an all-or-nothing feature. In our use case at LinkedIn, we would like
to selectively expose fast unencrypted access to fully managed internal
clients, which can be trusted, while only expose encrypted access to
clients outside of the trusted circle with higher security risks. That way
we minimize performance overhead for trusted internal clients while still
securing data from potential outside threats. Our design extends HDFS
NameNode to run on multiple ports, connecting to different NameNode ports
would end up with different levels of encryption protection. This
protection then gets enforced for both NameNode RPC and the subsequent data
transfers to/from DataNode. This approach comes with minimum operational
and performance overhead.Speaker: Konstantin Shvachko (LinkedIn), Chen
Liang (LinkedIn)11:10 - 11:55:-Talk: YuniKorn: Next Generation
Scheduling for YARN and K8sAbstract: We will talk about our open source
work - YuniKorn scheduler project (Y for YARN, K for K8s, uni- for Unified)
brings long-wanted features such as hierarchical queues, fairness between
users/jobs/queues, preemption to Kubernetes; and it brings service
scheduling enhancements to YARN. Any improvements to this scheduler can
benefit both Kubernetes and YARN community.Speaker: Wangda Tan
(Cloudera)PM:12:00 - 12:55 Lunch Break (Provided by
Cloudera)1:00 -
1:25---Talk: Yarn Efficiency at UberAbstract: We will present the
work done at Uber to improve YARN cluster utilization and job SOA with
elastic resource management, low compute workload on passive datacenter,
preemption, larger container, etc. We will also go through YARN upgrade in
order to adopt new features and talk about the challenges.Speaker: Aihua Xu
(Uber), Prashant Golash (Uber)1:30 - 2:10 One more
talk-2:20 - 4:00---BoF sessions &
Breakout Sessions & Group discussions: Talk about items like JDK 11
support, next releases (2.10.0, 3.3.0, etc.), Hadoop on Cloud, etc.4:00:
Reception provided by
Cloudera.==Join Zoom
Meetinghttps://cloudera.zoom.us/j/116816195
*



Re: Agenda & More Information about Hadoop Community Meetup @ Palo Alto, June 26

2019-06-25 Thread Wangda Tan
A friendly reminder,

The meetup will take place tomorrow at 9:00 AM PDT to 4:00 PM PDT.

The address is: 395 Page Mill Rd, Palo Alto, CA 94306
We’ll be in the Bigtop conference room on the 1st floor. Go left after
coming through the main entrance, and it will be on the right.

Zoom: https://cloudera.zoom.us/j/606607666

Please let me know if you have any questions. If you haven't RSVP yet,
please go ahead and RSVP so we can better prepare food, seat, etc.

Thanks,
Wangda

On Wed, Jun 19, 2019 at 4:49 PM Wangda Tan  wrote:

> Hi All,
>
> I want to let you know that we have confirmed most of the agenda for
> Hadoop Community Meetup. It will be a whole day event.
>
> Agenda & Dial-In info because see below, *please RSVP
> at https://www.meetup.com/Hadoop-Contributors/events/262055924/
> *
>
> Huge thanks to Daniel Templeton, Wei-Chiu Chuang, Christina Vu for helping
> with organizing and logistics.
>
> *Please help to promote meetup information on Twitter, LinkedIn, etc.
> Appreciated! *
>
> Best,
> Wangda
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *AM:9:00: Arrival and check-in--9:30 -
> 10:15:-Talk: Hadoop storage in cloud-native
> environmentsAbstract: Hadoop is a mature storage system but designed years
> before the cloud-native movement. Kubernetes and other cloud-native tools
> are emerging solutions for containerized environments but sometimes they
> require different approaches.In this presentation we would like to share
> our experiences to run Apache Hadoop Ozone in Kubernetes and the connection
> point to other cloud-native ecosystem elements. We will compare the
> benefits and drawbacks to use Kubernetes and Hadoop storage together and
> show our current achievements and future plans.Speaker: Marton Elek
> (Cloudera)10:20 - 11:00:--Talk: Selective Wire Encryption In
> HDFSAbstract: Wire data encryption is a key component of the Hadoop
> Distributed File System (HDFS). However, such encryption enforcement comes
> in as an all-or-nothing feature. In our use case at LinkedIn, we would like
> to selectively expose fast unencrypted access to fully managed internal
> clients, which can be trusted, while only expose encrypted access to
> clients outside of the trusted circle with higher security risks. That way
> we minimize performance overhead for trusted internal clients while still
> securing data from potential outside threats. Our design extends HDFS
> NameNode to run on multiple ports, connecting to different NameNode ports
> would end up with different levels of encryption protection. This
> protection then gets enforced for both NameNode RPC and the subsequent data
> transfers to/from DataNode. This approach comes with minimum operational
> and performance overhead.Speaker: Konstantin Shvachko (LinkedIn), Chen
> Liang (LinkedIn)11:10 - 11:55:-Talk: YuniKorn: Next Generation
> Scheduling for YARN and K8sAbstract: We will talk about our open source
> work - YuniKorn scheduler project (Y for YARN, K for K8s, uni- for Unified)
> brings long-wanted features such as hierarchical queues, fairness between
> users/jobs/queues, preemption to Kubernetes; and it brings service
> scheduling enhancements to YARN. Any improvements to this scheduler can
> benefit both Kubernetes and YARN community.Speaker: Wangda Tan
> (Cloudera)PM:12:00 - 12:55 Lunch Break (Provided by
> Cloudera)1:00 -
> 1:25---Talk: Yarn Efficiency at UberAbstract: We will present the
> work done at Uber to improve YARN cluster utilization and job SOA with
> elastic resource management, low compute workload on passive datacenter,
> preemption, larger container, etc. We will also go through YARN upgrade in
> order to adopt new features and talk about the challenges.Speaker: Aihua Xu
> (Uber), Prashant Golash (Uber)1:30 - 2:10 One more
> talk-2:20 - 4:00---BoF sessions &
> Breakout Sessions & Group discussions: Talk about items like JDK 11
> support, next releases (2.10.0, 3.3.0, etc.), Hadoop on Cloud, etc.4:00:
> Reception provided by
> Cloudera.==Join Zoom
> Meetinghttps://cloudera.zoom.us/j/116816195
> *
>


Re: NVMe Over fabric performance on HDFS

2019-06-25 Thread Wei-Chiu Chuang
There are a few Intel folks contributor NVMe related features in HDFS. They
are probably the best source for this questions.

Without having access to the NVMe hardware, it is hard to tell. I learned
GCE offers Intel Optane DC Persistent Memory attached instances. That can
be used for tests if any one is interested.

I personally have not received reports regarding unexpected performance
issue with NVMe with HDFS. A lot of test tuning could result in better
performance. File size can have a great impact in a TestDFSIO, for example.
You should also make sure you saturate the local NVMe rather than network
bandwidth. Try set replication factor=1? With the default replication
factor you pretty much saturate network rather than storage, I guess.

The Intel folks elected to implement DCPMM as a HDFS cache rather than a
storage. There's probably some consideration behind that.

On Tue, Jun 25, 2019 at 10:29 AM Daegyu Han  wrote:

> Hi Anu,
>
> Each datanode has own Samsung NVMe SSD which is on storage node.
> In other words, just separate compute node and storage (nvme ssd).
>
> I know that the maximum bandwidth of my Samsung NVMe SSD is about 3GB / s.
>
> Experimental results of TestDFSIO and HDFS_API show that the
> performance of local NVMe SSD is up to 2GB / s, while NVMeOF SSD has
> 500 ~ 800MB / s performance.
> Even IPoIB using InfiniBand has a bandwidth of 1GB / s.
>
> In research papers evaluating NVMeOF through FIO or KV Store
> applications, the performance of NVMeOF is similar to that of local
> SSD.
> They said also, in order to improve NVMeOF performance as much as
> local level, it is required to perform parallel IO.
> Why does not the performance of NVMeOF IO bandwidth in HDFS be as good as
> local?
>
> Regards,
> Daegyu
>
> 2019년 6월 26일 (수) 오전 12:04, Anu Engineer 님이 작성:
> >
> > Is your NVMe shared and all datanodes sending I/O to the same set of
> disks ? Is it possible for you to see the I/O queue length of the NVMe
> Devices?
> > I would suggest that you try to find out what is causing the perf issue,
> and once we know in ball park where the issue is -- that is, is it disks or
> HDFS, it might be possible to see what we can do.
> >
> >
> >
> > Thanks
> > Anu
> >
> >
> > On Tue, Jun 25, 2019 at 7:20 AM Daegyu Han  wrote:
> >>
> >> Hi all,
> >>
> >> I am using storage disaggregation by mounting nvme ssds on the storage
> node.
> >>
> >> When we connect the compute node and the storage node with nvme over
> >> fabric (nvmeof) and test it, performance is much lower than that of
> >> local storage (DAS).
> >>
> >> In general, we know that applications need to increase io parallelism
> >> and io size to improve the performance of nvmeof.
> >>
> >> How can I change the settings of hdfs specifically to improve the io
> >> performance of NVMeOF in HDFS?
> >>
> >> Best regards,
> >> Daegyu
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: user-h...@hadoop.apache.org
> >>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>


Re: NVMe Over fabric performance on HDFS

2019-06-25 Thread Daegyu Han
Hi Anu,

Each datanode has own Samsung NVMe SSD which is on storage node.
In other words, just separate compute node and storage (nvme ssd).

I know that the maximum bandwidth of my Samsung NVMe SSD is about 3GB / s.

Experimental results of TestDFSIO and HDFS_API show that the
performance of local NVMe SSD is up to 2GB / s, while NVMeOF SSD has
500 ~ 800MB / s performance.
Even IPoIB using InfiniBand has a bandwidth of 1GB / s.

In research papers evaluating NVMeOF through FIO or KV Store
applications, the performance of NVMeOF is similar to that of local
SSD.
They said also, in order to improve NVMeOF performance as much as
local level, it is required to perform parallel IO.
Why does not the performance of NVMeOF IO bandwidth in HDFS be as good as local?

Regards,
Daegyu

2019년 6월 26일 (수) 오전 12:04, Anu Engineer 님이 작성:
>
> Is your NVMe shared and all datanodes sending I/O to the same set of disks ? 
> Is it possible for you to see the I/O queue length of the NVMe Devices?
> I would suggest that you try to find out what is causing the perf issue, and 
> once we know in ball park where the issue is -- that is, is it disks or HDFS, 
> it might be possible to see what we can do.
>
>
>
> Thanks
> Anu
>
>
> On Tue, Jun 25, 2019 at 7:20 AM Daegyu Han  wrote:
>>
>> Hi all,
>>
>> I am using storage disaggregation by mounting nvme ssds on the storage node.
>>
>> When we connect the compute node and the storage node with nvme over
>> fabric (nvmeof) and test it, performance is much lower than that of
>> local storage (DAS).
>>
>> In general, we know that applications need to increase io parallelism
>> and io size to improve the performance of nvmeof.
>>
>> How can I change the settings of hdfs specifically to improve the io
>> performance of NVMeOF in HDFS?
>>
>> Best regards,
>> Daegyu
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: NVMe Over fabric performance on HDFS

2019-06-25 Thread Anu Engineer
Is your NVMe shared and all datanodes sending I/O to the same set of disks
? Is it possible for you to see the I/O queue length of the NVMe Devices?
I would suggest that you try to find out what is causing the perf issue,
and once we know in ball park where the issue is -- that is, is it disks or
HDFS, it might be possible to see what we can do.



Thanks
Anu


On Tue, Jun 25, 2019 at 7:20 AM Daegyu Han  wrote:

> Hi all,
>
> I am using storage disaggregation by mounting nvme ssds on the storage
> node.
>
> When we connect the compute node and the storage node with nvme over
> fabric (nvmeof) and test it, performance is much lower than that of
> local storage (DAS).
>
> In general, we know that applications need to increase io parallelism
> and io size to improve the performance of nvmeof.
>
> How can I change the settings of hdfs specifically to improve the io
> performance of NVMeOF in HDFS?
>
> Best regards,
> Daegyu
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>


NVMe Over fabric performance on HDFS

2019-06-25 Thread Daegyu Han
Hi all,

I am using storage disaggregation by mounting nvme ssds on the storage node.

When we connect the compute node and the storage node with nvme over
fabric (nvmeof) and test it, performance is much lower than that of
local storage (DAS).

In general, we know that applications need to increase io parallelism
and io size to improve the performance of nvmeof.

How can I change the settings of hdfs specifically to improve the io
performance of NVMeOF in HDFS?

Best regards,
Daegyu

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Build breaks on FreeBSD

2019-06-25 Thread Yuri
I am trying to fix the FreeBSD port for Hadoop. It almost builds, but 
breaks in the end, can't find the "./share" directory: 
https://issues.apache.org/jira/browse/HADOOP-16388



Could somebody please help with this?


Thank you,

Yuri



-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org