[ 
https://issues.apache.org/jira/browse/IMPALA-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114182#comment-17114182
 ] 

Sahil Takiar commented on IMPALA-9759:
--------------------------------------

I looked into this and uploading the warehouse snapshot via the HDFS CLI takes 
about 60 minutes in the best case (after a lot of tuning of the upload 
command), whereas the aws cli only takes four minutes. After thinking through 
the consistency model some more, I don't think our current approach guarantees 
consistency (or at best the consistency model is murky because S3 consistency 
docs are confusing).

Here are a few articles I read through:
 * 
[https://www.quora.com/In-Amazon-S3-is-new-put-after-delete-eventual-consistent]
 * [https://jayendrapatil.com/aws-s3-data-consistency-model/]

I think the danger is that we upload the snapshot to the same location during 
every run of our Jenkins job. So we keep on writing / deleting the same set of 
keys. This can cause issues because overwrite puts and deletes are eventually 
consistent. So the following (according to the quora article I linked above) 
could happen: the client issues the delete request, and then the put request; 
S3 recognizes the put request first, and then the delete request; the data from 
the put ends up being deleted by S3.

I think one simple solution here would be to add a UUID to our test-warehouse 
path that way all files in our warehouse snapshot end up with unique keys. This 
should guarantee that all data in the warehouse snapshot is uploaded atomically.

> Revisit integration of snapshot dataload with s3guard
> -----------------------------------------------------
>
>                 Key: IMPALA-9759
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9759
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Sahil Takiar
>            Priority: Critical
>              Labels: broken-build, flaky
>
> Sometimes, the s3 jobs (which use s3guard for consistency) sees test failures 
> due to missing files from the dataload snapshot (see bottom). This may be 
> related to the interaction of snapshot loading with s3guard. We should nail 
> down exactly the right procedure for loading the snapshot. Currently, we do 
> the following:
> 1. Remove any data from the s3bucket via the s3 commandline
> 2. Create the s3guard dynamodb table (or reuse existing one if a previous job 
> failed without deleting the old dynamodb table)
> 3. Prune any existing entries from that table
> 4. Load the snapshot to the s3 bucket
> In theory, this leave s3guard with an empty dynamodb table and an s3bucket 
> with data. As tests progress and try to access the s3 bucket, s3guard would 
> see that there is no entry in the dynamodb table and then check the 
> underlying s3 bucket.
> We need to revisit these steps and verify that everything is being done 
> correctly.
> {noformat}
> metadata/test_metadata_query_statements.py:70: in test_show_stats
>     self.run_test_case('QueryTest/show-stats', vector, "functional")
> common/impala_test_suite.py:687: in run_test_case
>     self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
>     replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
>     VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
>     assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E '2009','1',310,1,'19.95KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1'
>  == '2009','1',310,1,'19.95KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=1'
> E '2009','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10'
>  == '2009','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=10'
> E '2009','11',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11'
>  == '2009','11',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=11'
> E '2009','12',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12'
>  == '2009','12',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=12'
> E '2009','2',280,1,'18.12KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2'
>  == '2009','2',280,1,'18.12KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=2'
> E '2009','3',310,1,'20.06KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=3'
>  == '2009','3',310,1,'20.06KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=3'
> E '2009','4',300,1,'19.61KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=4'
>  == '2009','4',300,1,'19.61KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=4'
> E '2009','5',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=5'
>  != '2009','5',0,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=5'
> E '2009','6',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=6'
>  == '2009','6',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=6'
> E '2009','7',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=7'
>  == '2009','7',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=7'
> E '2009','8',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=8'
>  == '2009','8',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=8'
> E '2009','9',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=9'
>  == '2009','9',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2009/month=9'
> E '2010','1',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=1'
>  == '2010','1',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=1'
> E '2010','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=10'
>  == '2010','10',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=10'
> E '2010','11',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=11'
>  == '2010','11',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=11'
> E '2010','12',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=12'
>  == '2010','12',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=12'
> E '2010','2',280,1,'18.39KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=2'
>  == '2010','2',280,1,'18.39KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=2'
> E '2010','3',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=3'
>  == '2010','3',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=3'
> E '2010','4',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=4'
>  == '2010','4',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=4'
> E '2010','5',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=5'
>  == '2010','5',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=5'
> E '2010','6',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=6'
>  == '2010','6',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=6'
> E '2010','7',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=7'
>  == '2010','7',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=7'
> E '2010','8',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=8'
>  == '2010','8',310,1,'20.36KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=8'
> E '2010','9',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=9'
>  == '2010','9',300,1,'19.71KB','NOT CACHED','NOT 
> CACHED','TEXT','false','s3a://impala-test-uswest2-1/test-warehouse/alltypes/year=2010/month=9'
> E 'Total','',7300,24,'478.45KB','0B','','','','' != 
> 'Total','',6990,24,'478.45KB','0B','','','',''
> {noformat}
> This also shows up in cardinality calculations:
> {noformat}
> metadata/test_explain.py:113: in test_explain_validate_cardinality_estimates
>     check_cardinality(result.data, '7.30K')
> metadata/test_explain.py:98: in check_cardinality
>     query_result, expected_cardinality=expected_cardinality)
> metadata/test_explain.py:86: in check_row_size_and_cardinality
>     assert m.groups()[1] == expected_cardinality
> E assert '6.99K' == '7.30K'
> E - 6.99K
> E + 7.30K
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to