Hey Frank, thank you for looking into this problem! I still haven't
had a chance to look at it closely, but your suggestion seems valid to
fix the bug.

However, I don't think this should be a blocker for 0.21.1 because
1) If you set to use non-local deep storage, the docker cluster
properly spins up and ingestion succeeds. I tested this by myself
using s3 deep storage. If you want to use the local deep storage, you
can still work around the bug by setting the permission manually.
2) We cannot fix all bugs in one release because it's not possible. We
don't know how many bugs there are, and thus trying to fix all bugs
can delay the release forever. This is why we don't treat
non-regression bugs as the release blocker.
https://github.com/apache/druid/pull/11167 is a regression that was
first introduced in 0.21.0. This is why we are doing 0.21.1 to fix the
bug. But for non-regression bugs, I think our policy is reasonable to
move forward without holding ourselves back and spending a lot of time
for release.

The real problem I see here is the lack of tests for docker builds.
None of our tests currently use the docker image that we release. We
use docker for integration tests, but the image we use for it is
different. For a longer term, we should use the same docker image that
we release for integration tests, but this will be pretty tough work.
So we can start with a simpler test that verifies the docker image
only as a short term solution, such as building a docker image and
spinning up the cluster using it. I would suggest adding such tests as
well as fixing the bugs reported in the docker image in 0.22.0 instead
of further delaying 0.21.1. I believe we will start discussion for
0.22.0 release soon.

Jihoon

On Thu, May 20, 2021 at 12:13 AM frank chen <frankc...@apache.org> wrote:
>
> Hi Clint, Jihoon,
>
> I checked this problem, it's caused by the VOLUME problem too.
>
> /opt/data directory is mounted as a volume in docker-compose.yml , however
> it's not created in Dockerfile. That means the directory will be created as
> a user of root by docker during the starting of docker process.
> This problem does not exist on macOS because of different mechanism of
> osxfs(macos - Docker on MacOSX does not translate file ownership correctly
> in volumes - Stack Overflow
> <https://stackoverflow.com/questions/43097341/docker-on-macosx-does-not-translate-file-ownership-correctly-in-volumes>
> )
>
> I also checked old issues on GitHub, actually, there is an issue that
> reported this problem: https://github.com/apache/druid/issues/9779 .
> This problem has been there for a long time.
>
> IMO, we SHOULD NOT release this candidate unless we fix this problem
> because:
> 1. ingestion has no chance to succeed on linux
> 2. users have experienced enough docker related problems we had better
> resolve them all in a single release
>
> I'm going to fix this problem by adding following instructions in
> Dockerfile:
>
> RUN mkdir /opt/data \
>    && chown druid:druid /opt/data \
>    && chmod 775 /opt/data
>
> Thanks.
>
>
> Jihoon Son <jihoon...@apache.org> 于2021年5月20日周四 下午12:07写道:
>
> > I could reproduce the same issue on my linux as what Clint saw.
> > I didn't look into it closely, but agree that this doesn't seem like a
> > blocker for the 0.21.1 release since the issue is about the permission
> > on the local deep storage. However, for the users who want to try
> > Druid out using docker, we should call it out in the release notes.
> >
> > So, my vote is +1.
> >
> > src package:
> > - verified signature/checksum
> > - LICENSE/NOTICE present
> > - compiled, ran the license check/unit tests
> > - built binary distribution, ran both batch and kafka ingestion using
> > tutorial, ran some queries
> >
> > binary package:
> > - verified signature/checksum
> > - LICENSE/NOTICE present
> > - ran both batch and kafka ingestion using tutorial, ran some queries
> >
> > docker:
> > - verified checksum
> > - started cluster with docker-compose with the modified permission on
> > the local deep storage, ingested some data and ran some
> > queries
> >
> > On Wed, May 19, 2021 at 2:21 AM frank chen <frankc...@apache.org> wrote:
> > >
> > > Hi Clint,
> > >
> > > According to the docker instruction `COPY --chown=druid:druid
> > > --from=builder /opt /opt` in docker file, the user `druid` has permission
> > > to create directories under /opt.
> > >
> > >
> > > I don't know what happened. I will check RC1 tomorrow on a fresh linux
> > > environment to see if there's such problem.
> > >
> > >
> > > Clint Wylie <cwy...@apache.org> 于2021年5月19日周三 上午6:23写道:
> > >
> > > > Hmm, I'm actually still seeing a permissions related issue with the rc1
> > > > Docker image and Linux (Docker on Mac OS seems to work ok, which is
> > where I
> > > > initially tested).
> > > >
> > > > middlemanager    | 2021-05-18T21:20:39,239 INFO [forking-task-runner-0]
> > > > org.apache.druid.indexing.overlord.ForkingTaskRunner - Exception caught
> > > > during execution
> > > > middlemanager    | org.apache.druid.java.util.common.IOE: Unable to
> > create
> > > > task log dir[/opt/data/indexing-logs]
> > > > middlemanager    |      at
> > > >
> > > >
> > org.apache.druid.indexing.common.tasklogs.FileTaskLogs.pushTaskLog(FileTaskLogs.java:59)
> > > > ~[druid-indexing-service-0.21.1.jar:0.21.1]
> > > > middlemanager    |      at
> > > >
> > > >
> > org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:386)
> > > > [druid-indexing-service-0.21.1.jar:0.21.1]
> > > > middlemanager    |      at
> > > >
> > > >
> > org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:137)
> > > > [druid-indexing-service-0.21.1.jar:0.21.1]
> > > > middlemanager    |      at
> > > > java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_275]
> > > > middlemanager    |      at
> > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > > > [?:1.8.0_275]
> > > > middlemanager    |      at
> > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > > > [?:1.8.0_275]
> > > > middlemanager    |      at java.lang.Thread.run(Thread.java:748)
> > > > [?:1.8.0_275]
> > > > middlemanager    | 2021-05-18T21:20:39,247 INFO [forking-task-runner-0]
> > > > org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task
> > > > directory:
> > > >
> > var/druid/task/index_parallel_wikipedia_gmkopeph_2021-05-18T21:20:10.899Z
> > > >
> > > > the 'storage' directory that is used for tasklogs and deep storage in
> > the
> > > > docker-compose file seems to belong to root. If I change ownership to
> > the
> > > > user it works.
> > > >
> > > > I am unsure if this is necessarily a blocker because the containers
> > > > themselves do seem to actually work ok, and this seems to be an issue
> > with
> > > > the locally attached volume deep storage used by the example compose
> > file
> > > > (which I also forgot to update the version to 0.21.1 in this release).
> > > > Presumably everything would work ok if using something like s3 for deep
> > > > storage, though I haven't had the chance to confirm this yet.
> > > >
> > > >
> > > > On Fri, May 14, 2021 at 12:42 PM Clint Wylie <cwy...@apache.org>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have created a build for Apache Druid 0.21.1, release
> > > > > candidate 1.
> > > > >
> > > > > Thanks to everyone who has helped contribute to the release! You can
> > read
> > > > > the proposed release notes here:
> > > > > https://github.com/apache/druid/issues/11249
> > > > >
> > > > > The release candidate has been tagged in GitHub as
> > > > > druid-0.21.1-rc3 (9d142a2f19cceef38173e8d463a8cc1dfe1cb7ec),
> > > > > available here:
> > > > > https://github.com/apache/druid/releases/tag/druid-0.21.1-rc1
> > > > >
> > > > > The artifacts to be voted on are located here:
> > > > > https://dist.apache.org/repos/dist/dev/druid/0.21.1-rc1/
> > > > >
> > > > > A staged Maven repository is available for review at:
> > > > >
> > https://repository.apache.org/content/repositories/orgapachedruid-1024/
> > > > >
> > > > > Staged druid.apache.org website documentation is available here:
> > > > > https://druid.staged.apache.org/docs/0.21.1/design/index.html
> > > > >
> > > > > A Docker image containing the binary of the release candidate can be
> > > > > retrieved via:
> > > > > docker pull apache/druid:0.21.1-rc1
> > > > >
> > > > > artifact checksums
> > > > > src:
> > > > >
> > > > >
> > > >
> > 8ac8267e9fa8aebbd7e2aa7bf30cfa598083eea4e862856f2e28521443a74c8b4f3d7ede5ba9837ee802028913995567d8f10e991f2aac069fea0155b30083f9
> > > > > bin:
> > > > >
> > > > >
> > > >
> > 170a8861c2bd00078689316013b32d16e110c021be87bf183898b4eb3ffc1c06ca56e3c7b92ca38f9732ba3cf4ff4c1de22a68d7a4b93e6dc271b3c994416053
> > > > > docker:
> > 16b47d4d03b41aa6c5784b6f356d51bb7d3c50198cb5f8a289d0e9c4d425bb26
> > > > >
> > > > > Release artifacts are signed with the following key:
> > > > > https://people.apache.org/keys/committer/cwylie.asc
> > > > >
> > > > > This key and the key of other committers can also be found in the
> > > > project's
> > > > > KEYS file here:
> > > > > https://dist.apache.org/repos/dist/release/druid/KEYS
> > > > >
> > > > > (If you are a committer, please feel free to add your own key to that
> > > > file
> > > > > by following the instructions in the file's header.)
> > > > >
> > > > >
> > > > > Verify checksums:
> > > > > diff <(shasum -a512 apache-druid-0.21.1-src.tar.gz | \
> > > > > cut -d ' ' -f1) \
> > > > > <(cat apache-druid-0.21.1-src.tar.gz.sha512 ; echo)
> > > > >
> > > > > diff <(shasum -a512 apache-druid-0.21.1-bin.tar.gz | \
> > > > > cut -d ' ' -f1) \
> > > > > <(cat apache-druid-0.21.1-bin.tar.gz.sha512 ; echo)
> > > > >
> > > > > Verify signatures:
> > > > > gpg --verify apache-druid-0.21.1-src.tar.gz.asc \
> > > > > apache-druid-0.21.1-src.tar.gz
> > > > >
> > > > > gpg --verify apache-druid-0.21.1-bin.tar.gz.asc \
> > > > > apache-druid-0.21.1-bin.tar.gz
> > > > >
> > > > > Please review the proposed artifacts and vote. Note that Apache has
> > > > > specific requirements that must be met before +1 binding votes can be
> > > > cast
> > > > > by PMC members. Please refer to the policy at
> > > > > http://www.apache.org/legal/release-policy.html#policy for more
> > details.
> > > > >
> > > > > As part of the validation process, the release artifacts can be
> > generated
> > > > > from source by running:
> > > > > mvn clean install -Papache-release,dist -Dgpg.skip
> > > > >
> > > > > The RAT license check can be run from source by:
> > > > > mvn apache-rat:check -Prat
> > > > >
> > > > > This vote will be open for at least 72 hours. The vote will pass if a
> > > > > majority of at least three +1 PMC votes are cast.
> > > > >
> > > > > [ ] +1 Release this package as Apache Druid 0.21.1
> > > > > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> > > > > [ ] -1 Do not release this package because...
> > > > >
> > > > > Thanks!
> > > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to