[ 
https://issues.apache.org/jira/browse/IMPALA-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704144#comment-16704144
 ] 

ASF subversion and git services commented on IMPALA-7804:
---------------------------------------------------------

Commit f1a3c47959e17d301f970b7ea2755067d00ae986 in impala's branch 
refs/heads/master from [~joemcdonnell]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=f1a3c47 ]

IMPALA-7804: Mitigate s3 consistency issues for test_scanners

test_scanners.py has seen several flaky failures on
s3 due to eventual consistency. The symptom is Impala
being unable to read a file that it just loaded to s3.

A large number of tables used in test_scanners.py
use the file_utils helper functions for creating
the tables. These follow the pattern:
1. Copy files to temporary directory in HDFS/S3/etc
2. Create table
3. Run LOAD DATA to move the files to the table

In step #3, LOAD DATA gets the metadata for the
table before it runs the move statement on the
files. Subsequent queries on the table will not
need to reload metadata and can access the file
quickly after the move.

This changes the ordering to put the files in place
before loading metadata. This may improve the
likelihood that the filesystem is consistent by
the time we read it. Specifically, we now do:
1. Put the files in directory that the table
   will use when it is created.
2. Create table
Neither of these steps load metadata, so the next
query that runs will load metadata.

Change-Id: Id042496beabe0d0226b347e0653b820fee369f4e
Reviewed-on: http://gerrit.cloudera.org:8080/11959
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Various scanner tests intermittently failing on S3 on different runs
> --------------------------------------------------------------------
>
>                 Key: IMPALA-7804
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7804
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: David Knupp
>            Priority: Blocker
>              Labels: S3
>
> The failures have to do with getting AWS client credentials.
> *query_test/test_scanners.py:696: in test_decimal_encodings*
> _Stacktrace_
> {noformat}
> query_test/test_scanners.py:696: in test_decimal_encodings
>     self.run_test_case('QueryTest/parquet-decimal-formats', vector, 
> unique_database)
> common/impala_test_suite.py:496: in run_test_case
>     self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:358: in __verify_results_and_errors
>     replace_filenames_with_placeholder)
> common/test_result_verifier.py:438: in verify_raw_results
>     VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:260: in verify_query_result_is_equal
>     assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E     -255.00,-255.00,-255.00 == -255.00,-255.00,-255.00
> E     -255.00,-255.00,-255.00 != -65535.00,-65535.00,-65535.00
> E     -65535.00,-65535.00,-65535.00 != -9999999.99,-9999999.99,-9999999.99
> E     -65535.00,-65535.00,-65535.00 != 
> 0.00,-9999999999999999.99,-999999999999999999999999999999999999.99
> E     -9999999.99,-9999999.99,-9999999.99 != 0.00,0.00,0.00
> E     -9999999.99,-9999999.99,-9999999.99 != 
> 0.00,9999999999999999.99,999999999999999999999999999999999999.99
> E     0.00,-9999999999999999.99,-999999999999999999999999999999999999.99 != 
> 255.00,255.00,255.00
> E     0.00,-9999999999999999.99,-999999999999999999999999999999999999.99 != 
> 65535.00,65535.00,65535.00
> E     0.00,0.00,0.00 != 9999999.99,9999999.99,9999999.99
> E     0.00,0.00,0.00 != None
> E     0.00,9999999999999999.99,999999999999999999999999999999999999.99 != None
> E     0.00,9999999999999999.99,999999999999999999999999999999999999.99 != None
> E     255.00,255.00,255.00 != None
> E     255.00,255.00,255.00 != None
> E     65535.00,65535.00,65535.00 != None
> E     65535.00,65535.00,65535.00 != None
> E     9999999.99,9999999.99,9999999.99 != None
> E     9999999.99,9999999.99,9999999.99 != None
> E     Number of rows returned (expected vs actual): 18 != 9
> {noformat}
> _Standard Error_
> {noformat}
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_huge_num_rows_76a09ef1` CASCADE;
> -- 2018-11-01 09:42:41,140 INFO     MainThread: Started query 
> 4c4bc0e7b69d7641:130ffe7300000000
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_huge_num_rows_76a09ef1`;
> -- 2018-11-01 09:42:42,402 INFO     MainThread: Started query 
> e34d714d6a62cba1:2a8544d000000000
> -- 2018-11-01 09:42:42,405 INFO     MainThread: Created database 
> "test_huge_num_rows_76a09ef1" for test ID 
> "query_test/test_scanners.py::TestParquet::()::test_huge_num_rows[protocol: 
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> 18/11/01 09:42:43 DEBUG s3a.S3AFileSystem: Initializing S3AFileSystem for 
> impala-test-uswest2-1
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Propagating entries under 
> fs.s3a.bucket.impala-test-uswest2-1.
> 18/11/01 09:42:43 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/11/01 09:42:43 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 18/11/01 09:42:43 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: For URI s3a://impala-test-uswest2-1/, 
> using credentials AWSCredentialProviderList: BasicAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider 
> com.amazonaws.auth.InstanceProfileCredentialsProvider@15bbf42f
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.connection.maximum is 
> 1500
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.attempts.maximum is 20
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of 
> fs.s3a.connection.establish.timeout is 5000
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.connection.timeout is 
> 200000
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.socket.send.buffer is 
> 8192
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.socket.recv.buffer is 
> 8192
> 18/11/01 09:42:43 DEBUG s3a.S3AFileSystem: Using User-Agent: Hadoop 
> 3.0.0-cdh6.x-SNAPSHOT
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.paging.maximum is 5000
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.block.size is 33554432
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.readahead.range is 65536
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.max.total.tasks is 5
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.threads.keepalivetime 
> is 60
> 18/11/01 09:42:44 DEBUG s3a.AWSCredentialProviderList: No credentials 
> provided by BasicAWSCredentialsProvider: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or 
> secret key is null
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or 
> secret key is null
>       at 
> org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider.getCredentials(BasicAWSCredentialsProvider.java:51)
>       at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:117)
> {noformat}
> *query_test/test_scanners.py:351: in test_huge_num_rows*
> _Stacktrace_
> {noformat}
> query_test/test_scanners.py:351: in test_huge_num_rows
>     % unique_database)
> common/impala_connection.py:170: in execute
>     return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:182: in execute
>     handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:356: in __execute_query
>     self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:377: in wait_for_finished
>     raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    Query aborted:Disk I/O error: Failed to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/test_huge_num_rows_76a09ef1.db/huge_num_rows/huge_num_rows.parquet
> E   Error(2): No such file or directory
> E   Root cause: FileNotFoundException: No such file or directory: 
> s3a://impala-test-uswest2-1/test-warehouse/test_huge_num_rows_76a09ef1.db/huge_num_rows/huge_num_rows.parquet
> {noformat}
> _Standard Error_
> {noformat}
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_huge_num_rows_76a09ef1` CASCADE;
> -- 2018-11-01 09:42:41,140 INFO     MainThread: Started query 
> 4c4bc0e7b69d7641:130ffe7300000000
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_huge_num_rows_76a09ef1`;
> -- 2018-11-01 09:42:42,402 INFO     MainThread: Started query 
> e34d714d6a62cba1:2a8544d000000000
> -- 2018-11-01 09:42:42,405 INFO     MainThread: Created database 
> "test_huge_num_rows_76a09ef1" for test ID 
> "query_test/test_scanners.py::TestParquet::()::test_huge_num_rows[protocol: 
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> 18/11/01 09:42:43 DEBUG s3a.S3AFileSystem: Initializing S3AFileSystem for 
> impala-test-uswest2-1
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Propagating entries under 
> fs.s3a.bucket.impala-test-uswest2-1.
> 18/11/01 09:42:43 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/11/01 09:42:43 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 18/11/01 09:42:43 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: For URI s3a://impala-test-uswest2-1/, 
> using credentials AWSCredentialProviderList: BasicAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider 
> com.amazonaws.auth.InstanceProfileCredentialsProvider@15bbf42f
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.connection.maximum is 
> 1500
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.attempts.maximum is 20
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of 
> fs.s3a.connection.establish.timeout is 5000
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.connection.timeout is 
> 200000
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.socket.send.buffer is 
> 8192
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.socket.recv.buffer is 
> 8192
> 18/11/01 09:42:43 DEBUG s3a.S3AFileSystem: Using User-Agent: Hadoop 
> 3.0.0-cdh6.x-SNAPSHOT
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.paging.maximum is 5000
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.block.size is 33554432
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.readahead.range is 65536
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.max.total.tasks is 5
> 18/11/01 09:42:44 DEBUG s3a.S3AUtils: Value of fs.s3a.threads.keepalivetime 
> is 60
> 18/11/01 09:42:44 DEBUG s3a.AWSCredentialProviderList: No credentials 
> provided by BasicAWSCredentialsProvider: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or 
> secret key is null
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or 
> secret key is null
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to