[ https://issues.apache.org/jira/browse/IMPALA-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811317#comment-16811317 ]
Michael Ho edited comment on IMPALA-8394 at 4/5/19 10:30 PM: ------------------------------------------------------------- Thanks [~stakiar]. I just noticed that we could be issuing multiple calls to {{ReadFromPos}} in some cases so yes most likely we need to force a seek to move the offset to the right position as we are using exclusive file handle for S3. Let me give that a try. One piece which I have missed is that we always seek when using "borrowed" file handle in {{ReadFromPosInternal}}. I mis-read that part and assumed that we would always seek before reading which apparently is not the case when using exclusive file handle. {noformat} if (is_borrowed_fh) { if (hdfsSeek(hdfs_fs_, hdfs_file, position_in_file) != 0) { return Status(TErrorCode::DISK_IO_ERROR, GetBackendString(), Substitute("Error seeking to $0 in file: $1: $2", position_in_file, *scan_range_->file_string(), GetHdfsErrorMsg(""))); } } {noformat} was (Author: kwho): Thanks [~stakiar]. I just noticed that we could be issuing multiple calls to {{ReadFromPos}} in some cases so yes most likely we need to force a seek to move the offset to the right position as we are using exclusive file handle for S3. Let me give that a try. One piece which I have missed is that we always seek when using file handle cache in {{ReadFromPosInternal}}. I mis-read that part and assumed that we would always seek before reading which apparently is not the case: {noformat} if (is_borrowed_fh) { if (hdfsSeek(hdfs_fs_, hdfs_file, position_in_file) != 0) { return Status(TErrorCode::DISK_IO_ERROR, GetBackendString(), Substitute("Error seeking to $0 in file: $1: $2", position_in_file, *scan_range_->file_string(), GetHdfsErrorMsg(""))); } } {noformat} > Inconsistent data read from S3a connector > ----------------------------------------- > > Key: IMPALA-8394 > URL: https://issues.apache.org/jira/browse/IMPALA-8394 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 3.2.0, Impala 3.3.0 > Reporter: Michael Ho > Assignee: Michael Ho > Priority: Critical > > While testing a build with remote data cache > (https://github.com/michaelhkw/impala/commits/remote-cache-debug) with S3, it > was noticed that data read back from S3 through the HDFS S3 adaptor was > inconsistent. This was confirmed by computing the checksum of the buffer > right after a successful read. The following are the activities of 2 threads > in the log. > Both thread 18922 and 18924 tried to look up > s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq > at offset: 89814317. Both of them hit cache miss. They both read from S3 for > the content. Thread 18924 won the race to insert into the cache. When 18922 > came around later to try to insert the same entry into the cache, it noticed > that the checksum of the content inserted by thread 18924 was different from > its own content. > Please note that the checksum of the bytes read from S3 were computed and > logged in {{hdfs-file-reader.cc}} before the insertion into the cache (which > also computed the checksum again) and the inconsistency was also observed in > {{hdfs-file-reader.cc}} already, with thread 18924 computing > {{8299739883147237483}} while thread 18922 computing {{9118051972380785265}}. > We re-ran the same experiment with {{--use_hdfs_pread=true}} and the problem > went away. While I don't rule out bugs in the cache prototype at this point, > the debugging so far suggests the content read back from S3 via HDFS S3a > connector is inconsistent when pread was disabled. It could be that we > inadvertently shared the file handle somehow or there are some race > conditions in the S3a connector which got exposed by the timing change with > the cache enabled. > FWIW, we also ran the same experiment in HDFS remote read configuration and > it was not reproducible there either. > Thread 18924 > {noformat} > I0405 12:02:15.316999 18924 data-cache.cc:344] > ed4c2ab7791b5883:9f1507450000005f] Looking up > s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq > mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 bytes_read: 0 > buffer: 4d600000 > I0405 12:02:15.593314 18924 hdfs-file-reader.cc:185] > ed4c2ab7791b5883:9f1507450000005f] Caching file > s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq > mtime: 1549425284000 offset: 89814317 bytes_read 8332914 checksum > 8299739883147237483 > I0405 12:02:15.596087 18924 data-cache.cc:233] > ed4c2ab7791b5883:9f1507450000005f] Storing file > /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296 > len 8332914 checksum 8299739883147237483 > I0405 12:02:15.602699 18924 data-cache.cc:361] > ed4c2ab7791b5883:9f1507450000005f] Storing > s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq > mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 buffer: > 4d600000 stored: true > {noformat} > Thread 18922: > {noformat} > I0405 12:02:15.011065 18922 data-cache.cc:344] > ed4c2ab7791b5883:9f150745000000da] Looking up > s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq > mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 bytes_read: 0 > buffer: 59200000 > I0405 12:02:16.281126 18922 hdfs-file-reader.cc:185] > ed4c2ab7791b5883:9f150745000000da] Caching file > s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq > mtime: 1549425284000 offset: 89814317 bytes_read 8332914 checksum > 9118051972380785265 > I0405 12:02:16.282948 18922 data-cache.cc:166] > ed4c2ab7791b5883:9f150745000000da] Storing duplicated file > /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296 > len 8332914 checksum 8299739883147237483 buffer checksum: 9118051972380785265 > E0405 12:02:16.282974 18922 data-cache.cc:171] > ed4c2ab7791b5883:9f150745000000da] Write checksum mismatch for file > /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296 > entry len: 8332914 store_len: 8332914 Expected 8299739883147237483, Got > 9118051972380785265. > I0405 12:02:16.283023 18922 data-cache.cc:361] > ed4c2ab7791b5883:9f150745000000da] Storing > s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq > mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 buffer: > 59200000 stored: false > {noformat} > The problem is quite reproducible with TPCDS Q28 at TPCDS 3000 with parquet > format. > {noformat} > select * > from (select avg(ss_list_price) B1_LP > ,count(ss_list_price) B1_CNT > ,count(distinct ss_list_price) B1_CNTD > from store_sales > where ss_quantity between 0 and 5 > and (ss_list_price between 185 and 185+10 > or ss_coupon_amt between 10548 and 10548+1000 > or ss_wholesale_cost between 6 and 6+20)) B1, > (select avg(ss_list_price) B2_LP > ,count(ss_list_price) B2_CNT > ,count(distinct ss_list_price) B2_CNTD > from store_sales > where ss_quantity between 6 and 10 > and (ss_list_price between 28 and 28+10 > or ss_coupon_amt between 6100 and 6100+1000 > or ss_wholesale_cost between 27 and 27+20)) B2, > (select avg(ss_list_price) B3_LP > ,count(ss_list_price) B3_CNT > ,count(distinct ss_list_price) B3_CNTD > from store_sales > where ss_quantity between 11 and 15 > and (ss_list_price between 173 and 173+10 > or ss_coupon_amt between 6371 and 6371+1000 > or ss_wholesale_cost between 32 and 32+20)) B3, > (select avg(ss_list_price) B4_LP > ,count(ss_list_price) B4_CNT > ,count(distinct ss_list_price) B4_CNTD > from store_sales > where ss_quantity between 16 and 20 > and (ss_list_price between 101 and 101+10 > or ss_coupon_amt between 2938 and 2938+1000 > or ss_wholesale_cost between 21 and 21+20)) B4, > (select avg(ss_list_price) B5_LP > ,count(ss_list_price) B5_CNT > ,count(distinct ss_list_price) B5_CNTD > from store_sales > where ss_quantity between 21 and 25 > and (ss_list_price between 8 and 8+10 > or ss_coupon_amt between 5093 and 5093+1000 > or ss_wholesale_cost between 50 and 50+20)) B5, > (select avg(ss_list_price) B6_LP > ,count(ss_list_price) B6_CNT > ,count(distinct ss_list_price) B6_CNTD > from store_sales > where ss_quantity between 26 and 30 > and (ss_list_price between 110 and 110+10 > or ss_coupon_amt between 2276 and 2276+1000 > or ss_wholesale_cost between 36 and 36+20)) B6 > limit 100; > {noformat} > cc'ing [~stakiar], [~joemcdonnell] [~lv] [~tlipcon] [~drorke] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org