[ 
https://issues.apache.org/jira/browse/HAWQ-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726471#comment-16726471
 ] 

Fenggang Wang commented on HAWQ-1094:
-------------------------------------

Hello [~mli],  we have a customer who is still running hawq version 2.0.0. They 
have met this very issue in their environment.
I have already suggested them to upgrade hawq to newest version with fix on 
this issue.
https://github.com/apache/hawq/commit/5f3db03a7b52c4b14575a09f8af90e8a6be69192

On the other hand, they still have questions about this issue.  These two 
questions are just from your description.

Questions:

1. When we detect a bad checksum on the local block, why do we not mark the 
block as corrupt with the NameNode?
2. When we read the block using RemoteBlockReader, why doesn't it detect the 
bad block?

And they would rather add another question
3. Why fsck's output is healthy, but checksum return value from libhdfs3 is 
chechsum error.

Appreciate if you could shed some light on these questions.

thanks
WangFG

> Select on INTERNAL table returns wrong results when hdfs blocks have checksum 
> errors
> ------------------------------------------------------------------------------------
>
>                 Key: HAWQ-1094
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1094
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Fault Tolerance
>            Reporter: Ming LI
>            Assignee: Ming LI
>            Priority: Major
>             Fix For: backlog
>
>
> I created a parquet table and inserted the following values into the table:
> {code}
> sr37228_repro=# select * from number;
>  id
> ----
>   1
>   1
>   1
>   1
>   1
> (5 rows)
> {code}
> I then modified the data in two of the three blocks and tried reading the 
> data again.
> {code}
> Modifying contents of internal table blocks...
> Found hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 in hdfs
> Modifying block 
> /hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
>  on 172.28.21.155
> block_script.sh                                                               
>                                          100%  228     0.2KB/s   00:00
> Modifying block 
> /hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
>  on 172.28.21.156
> block_script.sh                                                               
>                                          100%  228     0.2KB/s   00:00
> Running count query again, this time with bad data in two of the three blocks
>  count |    id
> -------+----------
>      1 |        0
>      2 |        1
>      1 | 16777216
>      1 | 16777217
> (4 rows)
> Checking Showing file health:
> Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
> Connecting to namenode via 
> http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
> FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path 
> /hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
> /hawq_default/16385/16543/17000/10 206 bytes, 1 block(s):  OK
> 0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206 
> repl=3 
> [DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
>  
> DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
>  
> DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]
> Status: HEALTHY
>  Total size:    206 B
>  Total dirs:    0
>  Total files:   1
>  Total symlinks:                0
>  Total blocks (validated):      1 (avg. block size 206 B)
>  Minimally replicated blocks:   1 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0
>  Corrupt blocks:                0
>  Missing replicas:              0 (0.0 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
> {code}
> When setupBlockReader reads a bad block using the LocalBlockReader, the 
> reader correctly detects a bad checksum.
> {code}
> 2016-09-26 13:02:09.267021 
> PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource 
> manager discovered local host IPv4 address 
> 127.0.0.1",,,,,,,0,,"network_utils.c",210,
> 2016-09-26 13:02:09.267171 
> PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource 
> manager discovered local host IPv4 address 
> 172.28.21.155",,,,,,,0,,"network_utils.c",210,
> 2016-09-26 13:02:16.239048 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping 
> in memory mapping OidInMemHeapMapping",,,,,,"SET log_min_messages TO 
> 'debug5'",0,,"cdbinmemheapam.c",293,
> 2016-09-26 13:02:16.239289 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransactionCommand",,,,,,"SET
>  log_min_messages TO 'debug5'",0,,"postgres.c",3131,
> 2016-09-26 13:02:16.239435 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransaction",,,,,,"SET
>  log_min_messages TO 'debug5'",0,,"xact.c",5103,
> 2016-09-26 13:02:16.239819 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","name: 
> unnamed; blockState:       STARTED; state: INPROGR, xid/subid/cid: 6227/1/0, 
> nestlvl: 1, children: <>",,,,,,"SET log_min_messages TO 
> 'debug5'",0,,"xact.c",5128,
> 2016-09-26 13:02:16.239978 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping 
> in memory mapping OidInMemOnlyMapping",,,,,,"SET log_min_messages TO 
> 'debug5'",0,,"cdbinmemheapam.c",293,
> 2016-09-26 13:02:25.600367 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,,seg1,,,,,"DEBUG5","00000","First char: 'M'; gp_role = 
> 'execute'.",,,,,,,0,,"postgres.c",4737,
> 2016-09-26 13:02:25.600639 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","Message type M 
> received by from libpq, len = 1412",,,,,,,0,,"postgres.c",4813,
> 2016-09-26 13:02:25.600742 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG5","00000","MPP dispatched stmt 
> from QD: explain analyze select * from number;.",,,,,,,0,,"postgres.c",4893,
> 2016-09-26 13:02:25.600847 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","SetupProcessIdentity: 
> receive msg: 
> ProcessIdentity_Begin_slice_1_idx_0_gang_1_cmd_74_writer_t_End_ProcessIdentity",,,,,,,0,,"identity.c",365,
> 2016-09-26 13:02:25.600997 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity is 
> not init",,,,,,,0,,"identity.c",599,
> 2016-09-26 13:02:25.601129 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity: 
> slice 1 id 0 gang num 1 writer t",,,,,,,0,,"identity.c",602,
> 2016-09-26 13:02:25.601250 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG5","00000","Get a temporary 
> directory:/tmp/hawq/segment",,,,,,,0,,"cdbtmpdir.c",48,
> 2016-09-26 13:02:25.601351 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG1","00000","getLocalTmpDirFromSegmentConfig
>  session_id:143 command_id:74 qeidx:0 
> tmpdir:/tmp/hawq/segment",,,,,,,0,,"identity.c",418,
> 2016-09-26 13:02:25.601784 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG3","00000","StartTransactionCommand",,,,,,"explain
>  analyze select * from number;",0,,"postgres.c",3107,
> 2016-09-26 13:02:25.602075 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","StartTransaction",,,,,,"explain
>  analyze select * from number;",0,,"xact.c",5103,
> 2016-09-26 13:02:25.602195 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","name: unnamed; 
> blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 6228/1/0, nestlvl: 
> 1, children: <>",,,,,,"explain analyze select * from 
> number;",0,,"xact.c",5128,
> 2016-09-26 13:02:25.602578 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 0 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602703 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 1 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602836 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 2 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602994 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 3 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603104 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 4 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603211 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 5 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603317 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 6 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603572 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 7 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603751 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 8 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603881 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 9 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604003 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 10 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604110 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 11 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604216 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 12 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604323 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 13 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604555 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 14 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604697 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 15 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604848 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 16 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604959 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 17 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.605064 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 18 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.605591 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","Resource 
> enforcer finds cpu sub-system is disabled",,,,,,"explain analyze select * 
> from number;",0,,"resourceenforcer.c",908,
> 2016-09-26 13:02:25.605716 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Current nice 
> level of the process: 19",,,,,,"explain analyze select * from 
> number;",0,,"postgres.c",283,
> 2016-09-26 13:02:25.605856 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Reniced 
> process to level 19",,,,,,"explain analyze select * from 
> number;",0,,"postgres.c",302,
> 2016-09-26 13:02:25.606073 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","GetSnapshotData
>  setting globalxmin and xmin to 6228",,,,,,"explain analyze select * from 
> number;",0,,"procarray.c",552,
> 2016-09-26 13:02:25.606306 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Inserted entry 
> for query (sessionid=143, commandcnt=74)",,,,,,"explain analyze select * from 
> number;",0,,"workfile_queryspace.c",283,
> 2016-09-26 13:02:25.606748 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Have 
> both IPv6 and IPv4 choices",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",1291,
> 2016-09-26 13:02:25.606978 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket 
> ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select * 
> from number;",0,,"ic_udp.c",1303,
> 2016-09-26 13:02:25.607098 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket 
> 6 ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select * 
> from number;",0,,"ic_udp.c",1307,
> 2016-09-26 13:02:25.607207 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","bind 
> addrlen 28 fam 10",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",1318,
> 2016-09-26 13:02:25.607320 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
> default buffer size 124928 bytes",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",2200,
> 2016-09-26 13:02:25.607555 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
> use buffer size 2097152 bytes",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",2215,
> 2016-09-26 13:02:25.607678 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
> default buffer size 124928 bytes",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",2200,
> 2016-09-26 13:02:25.607787 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
> use buffer size 2097152 bytes",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",2215,
> 2016-09-26 13:02:25.607939 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","GetSockAddr 
> socket ai_family 2 ai_socktype 2 ai_protocol 17 for 
> 172.28.21.157",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",3058,
> 2016-09-26 13:02:25.608052 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","We 
> are inet6, remote is inet.  Converting to v4 mapped address.",,,,,,"explain 
> analyze select * from number;",0,,"ic_udp.c",3137,
> 2016-09-26 13:02:25.608249 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 0 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608706 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 1 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608836 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 2 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608966 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 3 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609083 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 4 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609200 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 5 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609316 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 6 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609657 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 7 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.613152 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","Parquet 
> metadata file footer length index: 198",,,,,,"explain analyze select * from 
> number;",0,,"cdbparquetfooterprocessor.c",141,
> 2016-09-26 13:02:25.676719 
> PDT,,,p380675,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error 
> log:
> 2016-09-26 13:02:25.676477, p384452, th140708219193472, ERROR cannot setup 
> block reader for Block: [block pool ID: 
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186] file 
> /hawq_default/16385/16543/17000/10 on Datanode: hdw2.hdp.local(172.28.21.155).
> LocalBlockReader.cpp: 127: HdfsIOException: Failed to construct 
> LocalBlockReader for block: [block pool ID: 
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
>         @       
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
>  const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
>         @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
>         @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, 
> bool)
>         @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
>         @       Hdfs::Internal::InputStreamImpl::read(char*, int)
>         @       hdfsRead
>         @       gpfs_hdfs_read
>         @       HdfsRead
>         @       FileRead
>         @       readParquetFooter
>         @       ParquetStorageRead_OpenFile
>         @       parquet_getnext
>         @       ParquetScanNext
>         @       ExecTableScan
>         @       ExecProcNode
>         @       ExecMotion
>         @       ExecProcNode
>         @       ExecutePlan
>         @       ExecutorRun
>         @       PortalRunSelect
>         @       PortalRun
>         @       PostgresMain
>         @       BackendStartup
>         @       ServerLoop
>         @       PostmasterMain
>         @       main
>         @       __libc_start_main
>         @       Unknown
> Caused by
> LocalBlockReader.cpp: 283: HdfsIOException: LocalBlockReader failed to skip 
> from position: 0, length: 0, block: [block pool ID: 
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
>         @       Hdfs::Internal::LocalBlockReader::skip(long)
>         @       
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
>  const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
>         @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
>         @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, 
> bool)
>         @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
>         @       Hdfs::Internal::InputStreamImpl::read(char*, int)
>         @       hdfsRead
>         @       gpfs_hdfs_read
>         @       HdfsRead
>         @       FileRead
>         @       readParquetFooter
>         @       ParquetStorageRead_OpenFile
>         @       parquet_getnext
>         @       ParquetScanNext
>         @       ExecTableScan
>         @       ExecProcNode
>         @       ExecMotion
>         @       ExecProcNode
>         @       ExecutePlan
>         @       ExecutorRun
>         @       PortalRunSelect
>         @       PortalRun
>         @       PostgresMain
>         @       BackendStartup
>         @       ServerLoop
>         @       PostmasterMain
>         @       main
>         @       __libc_start_main
>         @       Unknown
> Caused by
> LocalBlockReader.cpp: 156: ChecksumException: LocalBlockReader checksum not 
> match for block: [block pool ID: BP-2023073008-172.28.21.63-1462922052672 
> block ID 1073742008_1186]
>         @       Hdfs::Internal::LocalBlockReader::readAndVerify(int)
>         @       Hdfs::Internal::LocalBlockReader::skip(long)
>         @       
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
>  const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
>         @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
>         @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, 
> bool)
>         @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
>         @       Hdfs::Internal::InputStreamImpl::read(char*, int)
>         @       hdfsRead
>         @       gpfs_hdfs_read
>         @       HdfsRead
>         @       FileRead
>         @       readParquetFooter
>         @       ParquetStorageRead_OpenFile
>         @       parquet_getnext
>         @       ParquetScanNext
>         @       ExecTableScan
>         @       ExecProcNode
>         @       ExecMotion
>         @       ExecProcNode
>         @       ExecutePlan
>         @       ExecutorRun
>         @       PortalRunSelect
>         @       PortalRun
>         @       PostgresMain
>         @       BackendStartup
>         @       ServerLoop
>         @       PostmasterMain
>         @       main
>         @       __libc_start_main
>         @       Unknown
> retry the same node but disable read shortcircuit 
> feature",,,,,,,,"SysLoggerMain","syslogger.c",518,
> 2016-09-26 13:02:25.680638 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  
> {code}
> Even though it correctly detected the bad checksum using the 
> LocalBlockReader, when it calls the RemoteBlockReader it does not appear to 
> detect the bad checksum, and the read is allowed to go through.
> {code}
> sr37228_repro=# select * from number;
>     id
> ----------
>  16777217
>  16777216
>         0
>         1
>         1
> (5 rows)
> Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
> Connecting to namenode via 
> http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
> FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path 
> /hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
> /hawq_default/16385/16543/17000/10 206 bytes, 1 block(s):  OK
> 0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206 
> repl=3 
> [DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
>  
> DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
>  
> DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]
> Status: HEALTHY
>  Total size:    206 B
>  Total dirs:    0
>  Total files:   1
>  Total symlinks:                0
>  Total blocks (validated):      1 (avg. block size 206 B)
>  Minimally replicated blocks:   1 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0
>  Corrupt blocks:                0
>  Missing replicas:              0 (0.0 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
> The filesystem under path '/hawq_default/16385/16543/17000/10' is HEALTHY
> {code}
> The behavior of InputStreamImpl::setupBlockReader appears to be to: 
> 1. Attempt to read the block locally using LocalBlockReader
> 2. If the local block read fails, attempt to read the block from the next 
> available node using RemoteBlockReader
> 3. Continue to read all the available blocks using RemoteBlockReader until we 
> have no more blocks to read.
> In this case, the RemoteBlockReader appears to ignore the bad checksum in the 
> block, and returns wrong results.
> Questions:
> 1. When we detect a bad checksum on the local block, why do we not mark the 
> block as corrupt with the NameNode?
> 2. When we read the block using RemoteBlockReader, why doesn't it detect the 
> bad block?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to