[ https://issues.apache.org/jira/browse/IMPALA-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell closed IMPALA-3607. --------------------------------- Resolution: Won't Fix Closing this old issue. Things have changed substantially since this was filed and snapshots aren't as important as they were. > Reduce test data loading time from snapshot > ------------------------------------------- > > Key: IMPALA-3607 > URL: https://issues.apache.org/jira/browse/IMPALA-3607 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure > Affects Versions: Impala 2.5.0 > Reporter: Dimitris Tsirogiannis > Priority: Minor > Labels: test-infra > > Loading test data from snapshot takes a significant amount of time > (~20-30min). Given the amount of data loaded (~4GB), the process of loading > test data to a local 3-node min-hdfs cluster should be significantly faster. > The process currently works as follows: > 1. Download the latest snapshot > 2. Unzip > 3. Use hdfs dfs -put command to copy from local file system to hdfs > We believe the bulk of the time goes to step #3 and is attributed to namenode > overhead. Below are a few ideas we can try to improve this: > 1. Use a backup and restore approach for hdfs metadata/data that doesn't go > through the namenode. For example, once data is loaded to an hdfs cluster > using the old approach create two snapshots, one for metadata and one for > data. Loading the test data is just a matter of unzipping the snapshots to > the appropriate directories. A similar approach is used to backup and restore > hdfs clusters > (http://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_hdfs_metadata_backup.html). > A jenkins job would still be responsible for checking for changes in test > data, do the slow data loading and creating the new snapshots. > 2. Other ideas include the use of EC2 AMIs, docker and/or hdfs checkpointing. > 3. Use faster compression/decompression tools. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org