[ https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293536#comment-17293536 ]
Gabor Szadovszky commented on PARQUET-1992: ------------------------------------------- Some more info about the tar file generation. It is generated by the script [dev/source-release.sh|https://github.com/apache/parquet-mr/blob/master/dev/source-release.sh#L57]. The command {{git archive}} is used. It seems that {{git archive}} does not care about the git modules. However, it is not necessarily a bad thing. Currently the whole repository of parquet-testing is cloned. This is not a great deal because currently it is 136K only. But we are planning to extend that repo and also we can never know when will someone upload files for testing something that is unrelated to parquet-mr. Also, the content of parquet-testing is not something we would like to include in our source tarball. As a summary we need a method for downloading the required parquet files in a way that is working from both the git repo (at development or from the CI) and from the unpacked source tarball. > Cannot build from tarball because of git submodules > --------------------------------------------------- > > Key: PARQUET-1992 > URL: https://issues.apache.org/jira/browse/PARQUET-1992 > Project: Parquet > Issue Type: Bug > Reporter: Gabor Szadovszky > Priority: Blocker > > Because we use git submodules (to get test parquet files) a simple "mvn clean > install" fails from the unpacked tarball due to "not a git repository". > I think we would have 2 options to solve this situation: > * Include all the required files (even only for testing) in the tarball and > somehow avoid the git submodule update in case of executed in a non-git > envrionment > * Make the downloading of the parquet files and the related tests optional so > it won't fail the build from the tarball -- This message was sent by Atlassian Jira (v8.3.4#803005)