[ https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297218#comment-17297218 ]
Gabor Szadovszky commented on PARQUET-1992: ------------------------------------------- [~mayaa], bq. Benefit - the regular dev flow of building and running unit tests won't require downloading files and connectivity to github bq. We already need to download a bunch of file from the internet (maven plugins and dependencies). So even the tarball does require downloading if we want to build/test. bq. If so, they could be run by maven-failsafe-plugin as part of the integration-test/verify phase and missing the interop files would not fail "mvn install" but only "mvn verify" bq. AFAIK the failsafe plugin is configured to be executed at {{mvn verify}} and as {{install}} depends on the phase {{verify}} it still would fail if the integration tests could not be executed. BTW, we already have an integration test: [FileEncodingsIT|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/encodings/FileEncodingsIT.java]. bq. 2. Should the files for interop tests be downloaded directly in the test or using submodules in a separate maven profile for integration-test or as part of an existing profile, e.g. ci-test? bq. I think there is another option by downloading the required files directly from maven. I am not sure which plugin is capable of this or if it is better than downloading from the test by java code but it is still an option. bq. Git submodules provides flows for handling downloaded file versions - specific to a commit or a branch. bq. A github download link can contain the hash of the changeset so capable of handling file versions. bq. Git submodules manages downloading files only when needed bq. This is not true in the current situation. We are invoking the {{git submodule update}} in the {{initialization}} phase of maven. So we are downloading the whole {{parquet-testing}} repo (of a specific changeset) at least once. bq. It is aligned with the integration tests in parquet-cpp (arrow) bq. How parquet-cpp solves the similar issue with the tarball? bq. The files can be used for additional interop tests of other features bq. I agree, this was my first idea I liked in git submodules. Meanwhile, I've started thinking about implementing interoperability tests and now I think such tests could be implemented in the {{parquet-testing}} repo as they do not require low level access to the {{parquet-mr}} classes like unit tests do. My fear about the git submodules is that the {{parquet-testing}} repo might grow big and AFAIK you cannot control which files/directory you would like to sync only the changeset. bq. The tarball still won't contain the interop files, so the integration tests will fail on it. bq. I think we should not add the parquet files into the source tarball in any way. bq. Anyway, both ways are acceptable, so I'll implement whatever sounds best to the community. bq. I currently agree with [~sha...@uber.com] about downloading the required files. Meanwhile I am curious about the parquet-cpp solution. bq. BTW, when investigating the profiles, it seems to me that there is an old reference to the "travis" maven profile mentioned in the .travis.yml file, though its new name is "ci-test". bq. That's a good catch! We'll fix it. > Cannot build from tarball because of git submodules > --------------------------------------------------- > > Key: PARQUET-1992 > URL: https://issues.apache.org/jira/browse/PARQUET-1992 > Project: Parquet > Issue Type: Bug > Reporter: Gabor Szadovszky > Priority: Blocker > > Because we use git submodules (to get test parquet files) a simple "mvn clean > install" fails from the unpacked tarball due to "not a git repository". > I think we would have 2 options to solve this situation: > * Include all the required files (even only for testing) in the tarball and > somehow avoid the git submodule update in case of executed in a non-git > envrionment > * Make the downloading of the parquet files and the related tests optional so > it won't fail the build from the tarball -- This message was sent by Atlassian Jira (v8.3.4#803005)