Hi Jihoon,

I agree with you that the root cause is lack of some automated tests
against released docker images.

For now, because of permission of docker volume behaves differently on
macOS and linux, following steps are suggested to verify released docker
images:

1. execute command [docker volume prune] to clear all previous volumes to
make sure the host environment is clean before starting druid cluster by
docker-compose
2. running some basic tests(e.g. native ingestion) both on macOS and Linux



Jihoon Son <jihoon...@apache.org> 于2021年5月21日周五 上午10:42写道:

> Hey Frank, thank you for looking into this problem! I still haven't
> had a chance to look at it closely, but your suggestion seems valid to
> fix the bug.
>
> However, I don't think this should be a blocker for 0.21.1 because
> 1) If you set to use non-local deep storage, the docker cluster
> properly spins up and ingestion succeeds. I tested this by myself
> using s3 deep storage. If you want to use the local deep storage, you
> can still work around the bug by setting the permission manually.
> 2) We cannot fix all bugs in one release because it's not possible. We
> don't know how many bugs there are, and thus trying to fix all bugs
> can delay the release forever. This is why we don't treat
> non-regression bugs as the release blocker.
> https://github.com/apache/druid/pull/11167 is a regression that was
> first introduced in 0.21.0. This is why we are doing 0.21.1 to fix the
> bug. But for non-regression bugs, I think our policy is reasonable to
> move forward without holding ourselves back and spending a lot of time
> for release.
>
> The real problem I see here is the lack of tests for docker builds.
> None of our tests currently use the docker image that we release. We
> use docker for integration tests, but the image we use for it is
> different. For a longer term, we should use the same docker image that
> we release for integration tests, but this will be pretty tough work.
> So we can start with a simpler test that verifies the docker image
> only as a short term solution, such as building a docker image and
> spinning up the cluster using it. I would suggest adding such tests as
> well as fixing the bugs reported in the docker image in 0.22.0 instead
> of further delaying 0.21.1. I believe we will start discussion for
> 0.22.0 release soon.
>
> Jihoon
>
> On Thu, May 20, 2021 at 12:13 AM frank chen <frankc...@apache.org> wrote:
> >
> > Hi Clint, Jihoon,
> >
> > I checked this problem, it's caused by the VOLUME problem too.
> >
> > /opt/data directory is mounted as a volume in docker-compose.yml ,
> however
> > it's not created in Dockerfile. That means the directory will be created
> as
> > a user of root by docker during the starting of docker process.
> > This problem does not exist on macOS because of different mechanism of
> > osxfs(macos - Docker on MacOSX does not translate file ownership
> correctly
> > in volumes - Stack Overflow
> > <
> https://stackoverflow.com/questions/43097341/docker-on-macosx-does-not-translate-file-ownership-correctly-in-volumes
> >
> > )
> >
> > I also checked old issues on GitHub, actually, there is an issue that
> > reported this problem: https://github.com/apache/druid/issues/9779 .
> > This problem has been there for a long time.
> >
> > IMO, we SHOULD NOT release this candidate unless we fix this problem
> > because:
> > 1. ingestion has no chance to succeed on linux
> > 2. users have experienced enough docker related problems we had better
> > resolve them all in a single release
> >
> > I'm going to fix this problem by adding following instructions in
> > Dockerfile:
> >
> > RUN mkdir /opt/data \
> >    && chown druid:druid /opt/data \
> >    && chmod 775 /opt/data
> >
> > Thanks.
> >
> >
> > Jihoon Son <jihoon...@apache.org> 于2021年5月20日周四 下午12:07写道:
> >
> > > I could reproduce the same issue on my linux as what Clint saw.
> > > I didn't look into it closely, but agree that this doesn't seem like a
> > > blocker for the 0.21.1 release since the issue is about the permission
> > > on the local deep storage. However, for the users who want to try
> > > Druid out using docker, we should call it out in the release notes.
> > >
> > > So, my vote is +1.
> > >
> > > src package:
> > > - verified signature/checksum
> > > - LICENSE/NOTICE present
> > > - compiled, ran the license check/unit tests
> > > - built binary distribution, ran both batch and kafka ingestion using
> > > tutorial, ran some queries
> > >
> > > binary package:
> > > - verified signature/checksum
> > > - LICENSE/NOTICE present
> > > - ran both batch and kafka ingestion using tutorial, ran some queries
> > >
> > > docker:
> > > - verified checksum
> > > - started cluster with docker-compose with the modified permission on
> > > the local deep storage, ingested some data and ran some
> > > queries
> > >
> > > On Wed, May 19, 2021 at 2:21 AM frank chen <frankc...@apache.org>
> wrote:
> > > >
> > > > Hi Clint,
> > > >
> > > > According to the docker instruction `COPY --chown=druid:druid
> > > > --from=builder /opt /opt` in docker file, the user `druid` has
> permission
> > > > to create directories under /opt.
> > > >
> > > >
> > > > I don't know what happened. I will check RC1 tomorrow on a fresh
> linux
> > > > environment to see if there's such problem.
> > > >
> > > >
> > > > Clint Wylie <cwy...@apache.org> 于2021年5月19日周三 上午6:23写道:
> > > >
> > > > > Hmm, I'm actually still seeing a permissions related issue with
> the rc1
> > > > > Docker image and Linux (Docker on Mac OS seems to work ok, which is
> > > where I
> > > > > initially tested).
> > > > >
> > > > > middlemanager    | 2021-05-18T21:20:39,239 INFO
> [forking-task-runner-0]
> > > > > org.apache.druid.indexing.overlord.ForkingTaskRunner - Exception
> caught
> > > > > during execution
> > > > > middlemanager    | org.apache.druid.java.util.common.IOE: Unable to
> > > create
> > > > > task log dir[/opt/data/indexing-logs]
> > > > > middlemanager    |      at
> > > > >
> > > > >
> > >
> org.apache.druid.indexing.common.tasklogs.FileTaskLogs.pushTaskLog(FileTaskLogs.java:59)
> > > > > ~[druid-indexing-service-0.21.1.jar:0.21.1]
> > > > > middlemanager    |      at
> > > > >
> > > > >
> > >
> org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:386)
> > > > > [druid-indexing-service-0.21.1.jar:0.21.1]
> > > > > middlemanager    |      at
> > > > >
> > > > >
> > >
> org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:137)
> > > > > [druid-indexing-service-0.21.1.jar:0.21.1]
> > > > > middlemanager    |      at
> > > > > java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [?:1.8.0_275]
> > > > > middlemanager    |      at
> > > > >
> > > > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > > > > [?:1.8.0_275]
> > > > > middlemanager    |      at
> > > > >
> > > > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > > > > [?:1.8.0_275]
> > > > > middlemanager    |      at java.lang.Thread.run(Thread.java:748)
> > > > > [?:1.8.0_275]
> > > > > middlemanager    | 2021-05-18T21:20:39,247 INFO
> [forking-task-runner-0]
> > > > > org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing
> task
> > > > > directory:
> > > > >
> > >
> var/druid/task/index_parallel_wikipedia_gmkopeph_2021-05-18T21:20:10.899Z
> > > > >
> > > > > the 'storage' directory that is used for tasklogs and deep storage
> in
> > > the
> > > > > docker-compose file seems to belong to root. If I change ownership
> to
> > > the
> > > > > user it works.
> > > > >
> > > > > I am unsure if this is necessarily a blocker because the containers
> > > > > themselves do seem to actually work ok, and this seems to be an
> issue
> > > with
> > > > > the locally attached volume deep storage used by the example
> compose
> > > file
> > > > > (which I also forgot to update the version to 0.21.1 in this
> release).
> > > > > Presumably everything would work ok if using something like s3 for
> deep
> > > > > storage, though I haven't had the chance to confirm this yet.
> > > > >
> > > > >
> > > > > On Fri, May 14, 2021 at 12:42 PM Clint Wylie <cwy...@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I have created a build for Apache Druid 0.21.1, release
> > > > > > candidate 1.
> > > > > >
> > > > > > Thanks to everyone who has helped contribute to the release! You
> can
> > > read
> > > > > > the proposed release notes here:
> > > > > > https://github.com/apache/druid/issues/11249
> > > > > >
> > > > > > The release candidate has been tagged in GitHub as
> > > > > > druid-0.21.1-rc3 (9d142a2f19cceef38173e8d463a8cc1dfe1cb7ec),
> > > > > > available here:
> > > > > > https://github.com/apache/druid/releases/tag/druid-0.21.1-rc1
> > > > > >
> > > > > > The artifacts to be voted on are located here:
> > > > > > https://dist.apache.org/repos/dist/dev/druid/0.21.1-rc1/
> > > > > >
> > > > > > A staged Maven repository is available for review at:
> > > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachedruid-1024/
> > > > > >
> > > > > > Staged druid.apache.org website documentation is available here:
> > > > > > https://druid.staged.apache.org/docs/0.21.1/design/index.html
> > > > > >
> > > > > > A Docker image containing the binary of the release candidate
> can be
> > > > > > retrieved via:
> > > > > > docker pull apache/druid:0.21.1-rc1
> > > > > >
> > > > > > artifact checksums
> > > > > > src:
> > > > > >
> > > > > >
> > > > >
> > >
> 8ac8267e9fa8aebbd7e2aa7bf30cfa598083eea4e862856f2e28521443a74c8b4f3d7ede5ba9837ee802028913995567d8f10e991f2aac069fea0155b30083f9
> > > > > > bin:
> > > > > >
> > > > > >
> > > > >
> > >
> 170a8861c2bd00078689316013b32d16e110c021be87bf183898b4eb3ffc1c06ca56e3c7b92ca38f9732ba3cf4ff4c1de22a68d7a4b93e6dc271b3c994416053
> > > > > > docker:
> > > 16b47d4d03b41aa6c5784b6f356d51bb7d3c50198cb5f8a289d0e9c4d425bb26
> > > > > >
> > > > > > Release artifacts are signed with the following key:
> > > > > > https://people.apache.org/keys/committer/cwylie.asc
> > > > > >
> > > > > > This key and the key of other committers can also be found in the
> > > > > project's
> > > > > > KEYS file here:
> > > > > > https://dist.apache.org/repos/dist/release/druid/KEYS
> > > > > >
> > > > > > (If you are a committer, please feel free to add your own key to
> that
> > > > > file
> > > > > > by following the instructions in the file's header.)
> > > > > >
> > > > > >
> > > > > > Verify checksums:
> > > > > > diff <(shasum -a512 apache-druid-0.21.1-src.tar.gz | \
> > > > > > cut -d ' ' -f1) \
> > > > > > <(cat apache-druid-0.21.1-src.tar.gz.sha512 ; echo)
> > > > > >
> > > > > > diff <(shasum -a512 apache-druid-0.21.1-bin.tar.gz | \
> > > > > > cut -d ' ' -f1) \
> > > > > > <(cat apache-druid-0.21.1-bin.tar.gz.sha512 ; echo)
> > > > > >
> > > > > > Verify signatures:
> > > > > > gpg --verify apache-druid-0.21.1-src.tar.gz.asc \
> > > > > > apache-druid-0.21.1-src.tar.gz
> > > > > >
> > > > > > gpg --verify apache-druid-0.21.1-bin.tar.gz.asc \
> > > > > > apache-druid-0.21.1-bin.tar.gz
> > > > > >
> > > > > > Please review the proposed artifacts and vote. Note that Apache
> has
> > > > > > specific requirements that must be met before +1 binding votes
> can be
> > > > > cast
> > > > > > by PMC members. Please refer to the policy at
> > > > > > http://www.apache.org/legal/release-policy.html#policy for more
> > > details.
> > > > > >
> > > > > > As part of the validation process, the release artifacts can be
> > > generated
> > > > > > from source by running:
> > > > > > mvn clean install -Papache-release,dist -Dgpg.skip
> > > > > >
> > > > > > The RAT license check can be run from source by:
> > > > > > mvn apache-rat:check -Prat
> > > > > >
> > > > > > This vote will be open for at least 72 hours. The vote will pass
> if a
> > > > > > majority of at least three +1 PMC votes are cast.
> > > > > >
> > > > > > [ ] +1 Release this package as Apache Druid 0.21.1
> > > > > > [ ] 0 I don't feel strongly about it, but I'm okay with the
> release
> > > > > > [ ] -1 Do not release this package because...
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > For additional commands, e-mail: dev-h...@druid.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
>
>

Reply via email to