Re: Official Apache Slack Channel for Hadoop projects

2019-10-11 Thread Yufei Gu
Thanks Wei-Chiu. Just join both hdfs and yarn channel. Yes, there is a yarn
channel. There are only 3 members in the yarn channel.

Best,

Yufei

`This is not a contribution`


On Fri, Oct 11, 2019 at 4:35 PM Wei-Chiu Chuang  wrote:

> Hi Hadoop devs,
>
> In case you don't know, there is an official ASF slack, and there's a HDFS
> channel in it. This is the slack workplace managed by Apache Infra.
>
> Please see this wiki to get invite:
> https://cwiki.apache.org/confluence/display/INFRA/Slack+Guest+Invites or
> DM
> me to get an invite.
>
> Once you get access to the ASF workplace, search for #hdfs channel. There
> is also an #ozone channel, #hadoop, #submarine-dev, #submarine-user. I
> don't see a #yarn channel, but I can create one (not sure if who is
> eligible for creating channels, PMC or committers or any one?)
>
> We will not use Slack channel to vote on project decisions/vote, but it
> might be an easier way to find me. Right now the channels are quite dry.
> Let's see if we can revive them.
>
> Weichiu
>


Re: [DISCUSS] A unified and open Hadoop community sync up schedule?

2019-06-11 Thread Yufei Gu
+1 for this idea. Thanks Wangda for bringing this up.

Some comments to share:

   - Agenda needed to be posted ahead of meeting and welcome any interested
   party to contribute to topics.
   - We should encourage more people to attend. That's whole point of the
   meeting.
   - Hopefully, this can mitigate the situation that some patches are
   waiting for review for ever, which turns away new contributors.
   - 30m per session sounds a little bit short, we can try it out and see
   if extension is needed.

Best,

Yufei

`This is not a contribution`


On Fri, Jun 7, 2019 at 4:39 PM Wangda Tan  wrote:

> Hi Hadoop-devs,
>
> Previous we have regular YARN community sync up (1 hr, biweekly, but not
> open to public). Recently because of changes in our schedules, Less folks
> showed up in the sync up for the last several months.
>
> I saw the K8s community did a pretty good job to run their sig meetings,
> there's regular meetings for different topics, notes, agenda, etc. Such as
>
> https://docs.google.com/document/d/13mwye7nvrmV11q9_Eg77z-1w3X7Q1GTbslpml4J7F3A/edit
>
>
> For Hadoop community, there are less such regular meetings open to the
> public except for Ozone project and offline meetups or Bird-of-Features in
> Hadoop/DataWorks Summit. Recently we have a few folks joined DataWorks
> Summit at Washington DC and Barcelona, and lots (50+) of folks join the
> Ozone/Hadoop/YARN BoF, ask (good) questions and roadmaps. I think it is
> important to open such conversations to the public and let more
> folk/companies join.
>
> Discussed a small group of community members and wrote a short proposal
> about the form, time and topic of the community sync up, thanks for
> everybody who have contributed to the proposal! Please feel free to add
> your thoughts to the Proposal Google doc
> <
> https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#
> >
> .
>
> Especially for the following parts:
> - If you have interests to run any of the community sync-ups, please put
> your name to the table inside the proposal. We need more volunteers to help
> run the sync-ups in different timezones.
> - Please add suggestions to the time, frequency and themes and feel free to
> share your thoughts if we should do sync ups for other topics which are not
> covered by the proposal.
>
> Link to the Proposal Google doc
> <
> https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#
> >
>
> Thanks,
> Wangda Tan
>


Re: [VOTE] Release Apache Hadoop 3.0.2 (RC0)

2018-04-06 Thread Yufei Gu
Thanks Lei for working on this!

+1 (non-binding)

   - Downloaded the binary tarball and verified the checksum.
   - Started a pseudo cluster inside one docker container
   - Run Resource Manager with Fair Scheduler
   - Verified distributed shell
   - Verified mapreduce pi job
   - Sanity checked RM WebUI

Best,

Yufei

On Fri, Apr 6, 2018 at 11:16 AM, Lei Xu  wrote:

> Hi, All
>
> I've created release candidate RC-0 for Apache Hadoop 3.0.2.
>
> Please note: this is an amendment for Apache Hadoop 3.0.1 release to
> fix shaded jars in apache maven repository. The codebase of 3.0.2
> release is the same as 3.0.1.  New bug fixes will be included in
> Apache Hadoop 3.0.3 instead.
>
> The release page is:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release
>
> New RC is available at: http://home.apache.org/~lei/hadoop-3.0.2-RC0/
>
> The git tag is release-3.0.2-RC0, and the latest commit is
> 5c141f7c0f24c12cb8704a6ccc1ff8ec991f41ee
>
> The maven artifacts are available at
> https://repository.apache.org/content/repositories/orgapachehadoop-1096/
>
> Please try the release, especially, *verify the maven artifacts*, and vote.
>
> The vote will run 5 days, ending 4/11/2018.
>
> Thanks for everyone who helped to spot the error and proposed fixes!
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 3.1.0 (RC1)

2018-03-29 Thread Yufei Gu
 Thanks Wangda for working on this!

+1 (non-binding)

   - Downloaded the binary tarball and verified the checksum.
   - Started a pseudo cluster inside one docker container
   - Run Resource Manager with Fair Scheduler
   - Verified distributed shell
   - Verified mapreduce pi job
   - Sanity checked RM WebUI

Best,

Yufei

On Thu, Mar 29, 2018 at 9:15 PM, Wangda Tan  wrote:

> Hi folks,
>
> Thanks to the many who helped with this release since Dec 2017 [1]. We've
> created RC1 for Apache Hadoop 3.1.0. The artifacts are available here:
>
> http://people.apache.org/~wangda/hadoop-3.1.0-RC1
>
> The RC tag in git is release-3.1.0-RC1. Last git commit SHA is
> 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d
>
> The maven artifacts are available via repository.apache.org at
> https://repository.apache.org/content/repositories/orgapachehadoop-1090/
> This vote will run 5 days, ending on Apr 3 at 11:59 pm Pacific.
>
> 3.1.0 contains 766 [2] fixed JIRA issues since 3.0.0. Notable additions
> include the first class GPU/FPGA support on YARN, Native services, Support
> rich placement constraints in YARN, S3-related enhancements, allow HDFS
> block replicas to be provided by an external storage system, etc.
>
> For 3.1.0 RC0 vote discussion, please see [3].
>
> We’d like to use this as a starting release for 3.1.x [1], depending on how
> it goes, get it stabilized and potentially use a 3.1.1 in several weeks as
> the stable release.
>
> We have done testing with a pseudo cluster:
> - Ran distributed job.
> - GPU scheduling/isolation.
> - Placement constraints (intra-application anti-affinity) by using
> distributed shell.
>
> My +1 to start.
>
> Best,
> Wangda/Vinod
>
> [1]
> https://lists.apache.org/thread.html/b3fb3b6da8b6357a68513a6dfd104b
> c9e19e559aedc5ebedb4ca08c8@%3Cyarn-dev.hadoop.apache.org%3E
> [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.1.0)
> AND fixVersion not in (3.0.0, 3.0.0-beta1) AND status = Resolved ORDER BY
> fixVersion ASC
> [3]
> https://lists.apache.org/thread.html/b3a7dc075b7329fd660f65b48237d7
> 2d4061f26f83547e41d0983ea6@%3Cyarn-dev.hadoop.apache.org%3E
>


Re: [VOTE] Release Apache Hadoop 3.0.1 (RC1)

2018-03-20 Thread Yufei Gu
Thanks Eddy!

+1 (non-binding)

   - Downloaded the hadoop-3.0.1.tar.gz from
http://home.apache.org/~lei/hadoop-3.0.1-RC1/
   - Started a pseudo cluster inside one docker container
   - Verified distributed shell
   - Verified mapreduce pi job
   - Sanity check RM WebUI

Best,

Yufei

On Tue, Mar 20, 2018 at 9:32 AM, Eric Payne 
wrote:

>  Thanks for working on this release!
> +1 (binding)
> I tested the following:
> - yarn distributed shell job
>
> - yarn streaming job
>
> - inter-queue preemption
>
> - compared behavior of fair and fifo ordering policy
>
> - both userlimit_first mode and priority_first mode of intra-queue
> preemption
>
> Eric Payne
>
>
>
> On Saturday, March 17, 2018, 11:11:32 PM CDT, Lei Xu 
> wrote:
>
>  Hi, all
>
> I've created release candidate RC-1 for Apache Hadoop 3.0.1
>
> Apache Hadoop 3.0.1 will be the first bug fix release for Apache
> Hadoop 3.0 release. It includes 49 bug fixes and security fixes, which
> include 12
> blockers and 17 are critical.
>
> Please note:
> * HDFS-12990. Change default NameNode RPC port back to 8020. It makes
> incompatible changes to Hadoop 3.0.0.  After 3.0.1 releases, Apache
> Hadoop 3.0.0 will be deprecated due to this change.
>
> The release page is:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release
>
> New RC is available at: http://home.apache.org/~lei/hadoop-3.0.1-RC1/
>
> The git tag is release-3.0.1-RC1, and the latest commit is
> 496dc57cc2e4f4da117f7a8e3840aaeac0c1d2d0
>
> The maven artifacts are available at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1081/
>
> Please try the release and vote; the vote will run for the usual 5
> days, ending on 3/22/2017 6pm PST time.
>
> Thanks!
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


Re: [DISCUSS] official docker image(s) for hadoop

2017-09-13 Thread Yufei Gu
It would be very helpful for testing the RC. To vote a RC, committers and
PMCs usually spend lots of time to compile, deploy the RC, do several
sanity tests, then +1 for the RC. The docker image potentially saves the
compilation and deployment time, and people can do more tests.

Best,

Yufei

On Wed, Sep 13, 2017 at 11:19 AM, Wangda Tan  wrote:

> +1 to add Hadoop docker image for easier testing / prototyping, it gonna be
> super helpful!
>
> Thanks,
> Wangda
>
> On Wed, Sep 13, 2017 at 10:48 AM, Miklos Szegedi <
> miklos.szeg...@cloudera.com> wrote:
>
> > Marton, thank you for working on this. I think Official Docker images for
> > Hadoop would be very useful for a lot of reasons. I think that it is
> better
> > to have a coordinated effort with production ready base images with
> > dependent images for prototyping. Does anyone else have an opinion about
> > this?
> >
> > Thank you,
> > Miklos
> >
> > On Fri, Sep 8, 2017 at 5:45 AM, Marton, Elek  wrote:
> >
> > >
> > > TL;DR: I propose to create official hadoop images and upload them to
> the
> > > dockerhub.
> > >
> > > GOAL/SCOPE: I would like improve the existing documentation with
> > > easy-to-use docker based recipes to start hadoop clusters with various
> > > configuration.
> > >
> > > The images also could be used to test experimental features. For
> example
> > > ozone could be tested easily with these compose file and configuration:
> > >
> > > https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> > >
> > > Or even the configuration could be included in the compose file:
> > >
> > > https://github.com/elek/hadoop/blob/docker-2.8.0/example/doc
> > > ker-compose.yaml
> > >
> > > I would like to create separated example compose files for federation,
> > ha,
> > > metrics usage, etc. to make it easier to try out and understand the
> > > features.
> > >
> > > CONTEXT: There is an existing Jira https://issues.apache.org/jira
> > > /browse/HADOOP-13397
> > > But it’s about a tool to generate production quality docker images
> > > (multiple types, in a flexible way). If no objections, I will create a
> > > separated issue to create simplified docker images for rapid
> prototyping
> > > and investigating new features. And register the branch to the
> dockerhub
> > to
> > > create the images automatically.
> > >
> > > MY BACKGROUND: I am working with docker based hadoop/spark clusters
> quite
> > > a while and run them succesfully in different environments (kubernetes,
> > > docker-swarm, nomad-based scheduling, etc.) My work is available from
> > here:
> > > https://github.com/flokkr but they could handle more complex use cases
> > > (eg. instrumenting java processes with btrace, or read/reload
> > configuration
> > > from consul).
> > >  And IMHO in the official hadoop documentation it’s better to suggest
> to
> > > use official apache docker images and not external ones (which could be
> > > changed).
> > >
> > > Please let me know if you have any comments.
> > >
> > > Marton
> > >
> > > -
> > > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> > >
> > >
> >
>


[jira] [Created] (HDFS-11143) start.sh doesn't return any error message even namenode is not up.

2016-11-15 Thread Yufei Gu (JIRA)
Yufei Gu created HDFS-11143:
---

 Summary: start.sh doesn't return any error message even namenode 
is not up.
 Key: HDFS-11143
 URL: https://issues.apache.org/jira/browse/HDFS-11143
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yufei Gu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-8926) Update the distcp document for new improvements by using snapshot diff report

2015-08-19 Thread Yufei Gu (JIRA)
Yufei Gu created HDFS-8926:
--

 Summary: Update the distcp document for new improvements by using 
snapshot diff report
 Key: HDFS-8926
 URL: https://issues.apache.org/jira/browse/HDFS-8926
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, documentation
Reporter: Yufei Gu
Assignee: Yufei Gu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-07-28 Thread Yufei Gu (JIRA)
Yufei Gu created HDFS-8828:
--

 Summary: Utilize Snapshot diff report to build copy list in distcp
 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yufei Gu
Assignee: Yufei Gu


Some users reported huge time cost to build file copy list in distcp. (30 hours 
with 1.6M files). We can leverage snapshot diff report to build file copy list 
including files/dirs which are changes only between two snapshots (or a 
snapshot and a normal dir). It speed up the process in two folds: 1. less copy 
list building time. 2. less file copy MR jobs.

HDFS snapshot diff report provide information about file/directory creation, 
deletion, rename and modification between two snapshots or a snapshot and a 
normal directory. HDFS-7535 synchronize deletion and rename, the fallback to 
the default distcp. So it still relies on default distcp to building copy list 
which will traverse all files under the source dir. This patch will build the 
copy list based on snapshot diff report. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)