[ 
https://issues.apache.org/jira/browse/HAWQ-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726541#comment-16726541
 ] 

Ming LI commented on HAWQ-1094:
-------------------------------

About the 3): from the link:

[https://community.hortonworks.com/questions/17917/best-way-of-handling-corrupt-or-missing-blocks.html]

all three replicas are damaged, then 'hdfs fsck' will report that block as 
"corrupt" - i.e. HDFS cannot self-heal the block from any of its replicas.

So if one replica error, fsck will not report block corrupt.

> Select on INTERNAL table returns wrong results when hdfs blocks have checksum 
> errors
> ------------------------------------------------------------------------------------
>
>                 Key: HAWQ-1094
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1094
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Fault Tolerance
>            Reporter: Ming LI
>            Assignee: Ming LI
>            Priority: Major
>             Fix For: backlog
>
>
> I created a parquet table and inserted the following values into the table:
> {code}
> sr37228_repro=# select * from number;
>  id
> ----
>   1
>   1
>   1
>   1
>   1
> (5 rows)
> {code}
> I then modified the data in two of the three blocks and tried reading the 
> data again.
> {code}
> Modifying contents of internal table blocks...
> Found hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 in hdfs
> Modifying block 
> /hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
>  on 172.28.21.155
> block_script.sh                                                               
>                                          100%  228     0.2KB/s   00:00
> Modifying block 
> /hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
>  on 172.28.21.156
> block_script.sh                                                               
>                                          100%  228     0.2KB/s   00:00
> Running count query again, this time with bad data in two of the three blocks
>  count |    id
> -------+----------
>      1 |        0
>      2 |        1
>      1 | 16777216
>      1 | 16777217
> (4 rows)
> Checking Showing file health:
> Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
> Connecting to namenode via 
> http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
> FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path 
> /hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
> /hawq_default/16385/16543/17000/10 206 bytes, 1 block(s):  OK
> 0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206 
> repl=3 
> [DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
>  
> DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
>  
> DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]
> Status: HEALTHY
>  Total size:    206 B
>  Total dirs:    0
>  Total files:   1
>  Total symlinks:                0
>  Total blocks (validated):      1 (avg. block size 206 B)
>  Minimally replicated blocks:   1 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0
>  Corrupt blocks:                0
>  Missing replicas:              0 (0.0 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
> {code}
> When setupBlockReader reads a bad block using the LocalBlockReader, the 
> reader correctly detects a bad checksum.
> {code}
> 2016-09-26 13:02:09.267021 
> PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource 
> manager discovered local host IPv4 address 
> 127.0.0.1",,,,,,,0,,"network_utils.c",210,
> 2016-09-26 13:02:09.267171 
> PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource 
> manager discovered local host IPv4 address 
> 172.28.21.155",,,,,,,0,,"network_utils.c",210,
> 2016-09-26 13:02:16.239048 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping 
> in memory mapping OidInMemHeapMapping",,,,,,"SET log_min_messages TO 
> 'debug5'",0,,"cdbinmemheapam.c",293,
> 2016-09-26 13:02:16.239289 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransactionCommand",,,,,,"SET
>  log_min_messages TO 'debug5'",0,,"postgres.c",3131,
> 2016-09-26 13:02:16.239435 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransaction",,,,,,"SET
>  log_min_messages TO 'debug5'",0,,"xact.c",5103,
> 2016-09-26 13:02:16.239819 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","name: 
> unnamed; blockState:       STARTED; state: INPROGR, xid/subid/cid: 6227/1/0, 
> nestlvl: 1, children: <>",,,,,,"SET log_min_messages TO 
> 'debug5'",0,,"xact.c",5128,
> 2016-09-26 13:02:16.239978 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping 
> in memory mapping OidInMemOnlyMapping",,,,,,"SET log_min_messages TO 
> 'debug5'",0,,"cdbinmemheapam.c",293,
> 2016-09-26 13:02:25.600367 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,,seg1,,,,,"DEBUG5","00000","First char: 'M'; gp_role = 
> 'execute'.",,,,,,,0,,"postgres.c",4737,
> 2016-09-26 13:02:25.600639 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","Message type M 
> received by from libpq, len = 1412",,,,,,,0,,"postgres.c",4813,
> 2016-09-26 13:02:25.600742 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG5","00000","MPP dispatched stmt 
> from QD: explain analyze select * from number;.",,,,,,,0,,"postgres.c",4893,
> 2016-09-26 13:02:25.600847 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","SetupProcessIdentity: 
> receive msg: 
> ProcessIdentity_Begin_slice_1_idx_0_gang_1_cmd_74_writer_t_End_ProcessIdentity",,,,,,,0,,"identity.c",365,
> 2016-09-26 13:02:25.600997 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity is 
> not init",,,,,,,0,,"identity.c",599,
> 2016-09-26 13:02:25.601129 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity: 
> slice 1 id 0 gang num 1 writer t",,,,,,,0,,"identity.c",602,
> 2016-09-26 13:02:25.601250 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG5","00000","Get a temporary 
> directory:/tmp/hawq/segment",,,,,,,0,,"cdbtmpdir.c",48,
> 2016-09-26 13:02:25.601351 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG1","00000","getLocalTmpDirFromSegmentConfig
>  session_id:143 command_id:74 qeidx:0 
> tmpdir:/tmp/hawq/segment",,,,,,,0,,"identity.c",418,
> 2016-09-26 13:02:25.601784 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG3","00000","StartTransactionCommand",,,,,,"explain
>  analyze select * from number;",0,,"postgres.c",3107,
> 2016-09-26 13:02:25.602075 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","StartTransaction",,,,,,"explain
>  analyze select * from number;",0,,"xact.c",5103,
> 2016-09-26 13:02:25.602195 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","name: unnamed; 
> blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 6228/1/0, nestlvl: 
> 1, children: <>",,,,,,"explain analyze select * from 
> number;",0,,"xact.c",5128,
> 2016-09-26 13:02:25.602578 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 0 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602703 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 1 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602836 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 2 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602994 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 3 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603104 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 4 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603211 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 5 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603317 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 6 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603572 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 7 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603751 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 8 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603881 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 9 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604003 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 10 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604110 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 11 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604216 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 12 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604323 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 13 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604555 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 14 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604697 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 15 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604848 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 16 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604959 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 17 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.605064 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
> index 18 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.605591 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","Resource 
> enforcer finds cpu sub-system is disabled",,,,,,"explain analyze select * 
> from number;",0,,"resourceenforcer.c",908,
> 2016-09-26 13:02:25.605716 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Current nice 
> level of the process: 19",,,,,,"explain analyze select * from 
> number;",0,,"postgres.c",283,
> 2016-09-26 13:02:25.605856 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Reniced 
> process to level 19",,,,,,"explain analyze select * from 
> number;",0,,"postgres.c",302,
> 2016-09-26 13:02:25.606073 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","GetSnapshotData
>  setting globalxmin and xmin to 6228",,,,,,"explain analyze select * from 
> number;",0,,"procarray.c",552,
> 2016-09-26 13:02:25.606306 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Inserted entry 
> for query (sessionid=143, commandcnt=74)",,,,,,"explain analyze select * from 
> number;",0,,"workfile_queryspace.c",283,
> 2016-09-26 13:02:25.606748 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Have 
> both IPv6 and IPv4 choices",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",1291,
> 2016-09-26 13:02:25.606978 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket 
> ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select * 
> from number;",0,,"ic_udp.c",1303,
> 2016-09-26 13:02:25.607098 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket 
> 6 ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select * 
> from number;",0,,"ic_udp.c",1307,
> 2016-09-26 13:02:25.607207 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","bind 
> addrlen 28 fam 10",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",1318,
> 2016-09-26 13:02:25.607320 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
> default buffer size 124928 bytes",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",2200,
> 2016-09-26 13:02:25.607555 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
> use buffer size 2097152 bytes",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",2215,
> 2016-09-26 13:02:25.607678 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
> default buffer size 124928 bytes",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",2200,
> 2016-09-26 13:02:25.607787 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
> use buffer size 2097152 bytes",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",2215,
> 2016-09-26 13:02:25.607939 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","GetSockAddr 
> socket ai_family 2 ai_socktype 2 ai_protocol 17 for 
> 172.28.21.157",,,,,,"explain analyze select * from 
> number;",0,,"ic_udp.c",3058,
> 2016-09-26 13:02:25.608052 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","We 
> are inet6, remote is inet.  Converting to v4 mapped address.",,,,,,"explain 
> analyze select * from number;",0,,"ic_udp.c",3137,
> 2016-09-26 13:02:25.608249 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 0 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608706 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 1 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608836 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 2 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608966 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 3 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609083 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 4 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609200 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 5 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609316 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 6 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609657 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
> index 7 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.613152 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  12:32:31 
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","Parquet 
> metadata file footer length index: 198",,,,,,"explain analyze select * from 
> number;",0,,"cdbparquetfooterprocessor.c",141,
> 2016-09-26 13:02:25.676719 
> PDT,,,p380675,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error 
> log:
> 2016-09-26 13:02:25.676477, p384452, th140708219193472, ERROR cannot setup 
> block reader for Block: [block pool ID: 
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186] file 
> /hawq_default/16385/16543/17000/10 on Datanode: hdw2.hdp.local(172.28.21.155).
> LocalBlockReader.cpp: 127: HdfsIOException: Failed to construct 
> LocalBlockReader for block: [block pool ID: 
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
>         @       
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
>  const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
>         @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
>         @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, 
> bool)
>         @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
>         @       Hdfs::Internal::InputStreamImpl::read(char*, int)
>         @       hdfsRead
>         @       gpfs_hdfs_read
>         @       HdfsRead
>         @       FileRead
>         @       readParquetFooter
>         @       ParquetStorageRead_OpenFile
>         @       parquet_getnext
>         @       ParquetScanNext
>         @       ExecTableScan
>         @       ExecProcNode
>         @       ExecMotion
>         @       ExecProcNode
>         @       ExecutePlan
>         @       ExecutorRun
>         @       PortalRunSelect
>         @       PortalRun
>         @       PostgresMain
>         @       BackendStartup
>         @       ServerLoop
>         @       PostmasterMain
>         @       main
>         @       __libc_start_main
>         @       Unknown
> Caused by
> LocalBlockReader.cpp: 283: HdfsIOException: LocalBlockReader failed to skip 
> from position: 0, length: 0, block: [block pool ID: 
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
>         @       Hdfs::Internal::LocalBlockReader::skip(long)
>         @       
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
>  const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
>         @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
>         @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, 
> bool)
>         @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
>         @       Hdfs::Internal::InputStreamImpl::read(char*, int)
>         @       hdfsRead
>         @       gpfs_hdfs_read
>         @       HdfsRead
>         @       FileRead
>         @       readParquetFooter
>         @       ParquetStorageRead_OpenFile
>         @       parquet_getnext
>         @       ParquetScanNext
>         @       ExecTableScan
>         @       ExecProcNode
>         @       ExecMotion
>         @       ExecProcNode
>         @       ExecutePlan
>         @       ExecutorRun
>         @       PortalRunSelect
>         @       PortalRun
>         @       PostgresMain
>         @       BackendStartup
>         @       ServerLoop
>         @       PostmasterMain
>         @       main
>         @       __libc_start_main
>         @       Unknown
> Caused by
> LocalBlockReader.cpp: 156: ChecksumException: LocalBlockReader checksum not 
> match for block: [block pool ID: BP-2023073008-172.28.21.63-1462922052672 
> block ID 1073742008_1186]
>         @       Hdfs::Internal::LocalBlockReader::readAndVerify(int)
>         @       Hdfs::Internal::LocalBlockReader::skip(long)
>         @       
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
>  const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
>         @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
>         @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, 
> bool)
>         @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
>         @       Hdfs::Internal::InputStreamImpl::read(char*, int)
>         @       hdfsRead
>         @       gpfs_hdfs_read
>         @       HdfsRead
>         @       FileRead
>         @       readParquetFooter
>         @       ParquetStorageRead_OpenFile
>         @       parquet_getnext
>         @       ParquetScanNext
>         @       ExecTableScan
>         @       ExecProcNode
>         @       ExecMotion
>         @       ExecProcNode
>         @       ExecutePlan
>         @       ExecutorRun
>         @       PortalRunSelect
>         @       PortalRun
>         @       PostgresMain
>         @       BackendStartup
>         @       ServerLoop
>         @       PostmasterMain
>         @       main
>         @       __libc_start_main
>         @       Unknown
> retry the same node but disable read shortcircuit 
> feature",,,,,,,,"SysLoggerMain","syslogger.c",518,
> 2016-09-26 13:02:25.680638 
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>  
> {code}
> Even though it correctly detected the bad checksum using the 
> LocalBlockReader, when it calls the RemoteBlockReader it does not appear to 
> detect the bad checksum, and the read is allowed to go through.
> {code}
> sr37228_repro=# select * from number;
>     id
> ----------
>  16777217
>  16777216
>         0
>         1
>         1
> (5 rows)
> Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
> Connecting to namenode via 
> http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
> FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path 
> /hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
> /hawq_default/16385/16543/17000/10 206 bytes, 1 block(s):  OK
> 0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206 
> repl=3 
> [DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
>  
> DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
>  
> DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]
> Status: HEALTHY
>  Total size:    206 B
>  Total dirs:    0
>  Total files:   1
>  Total symlinks:                0
>  Total blocks (validated):      1 (avg. block size 206 B)
>  Minimally replicated blocks:   1 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0
>  Corrupt blocks:                0
>  Missing replicas:              0 (0.0 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
> The filesystem under path '/hawq_default/16385/16543/17000/10' is HEALTHY
> {code}
> The behavior of InputStreamImpl::setupBlockReader appears to be to: 
> 1. Attempt to read the block locally using LocalBlockReader
> 2. If the local block read fails, attempt to read the block from the next 
> available node using RemoteBlockReader
> 3. Continue to read all the available blocks using RemoteBlockReader until we 
> have no more blocks to read.
> In this case, the RemoteBlockReader appears to ignore the bad checksum in the 
> block, and returns wrong results.
> Questions:
> 1. When we detect a bad checksum on the local block, why do we not mark the 
> block as corrupt with the NameNode?
> 2. When we read the block using RemoteBlockReader, why doesn't it detect the 
> bad block?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to