[
https://issues.apache.org/jira/browse/HAWQ-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726541#comment-16726541
]
Ming LI commented on HAWQ-1094:
-------------------------------
About the 3): from the link:
[https://community.hortonworks.com/questions/17917/best-way-of-handling-corrupt-or-missing-blocks.html]
all three replicas are damaged, then 'hdfs fsck' will report that block as
"corrupt" - i.e. HDFS cannot self-heal the block from any of its replicas.
So if one replica error, fsck will not report block corrupt.
> Select on INTERNAL table returns wrong results when hdfs blocks have checksum
> errors
> ------------------------------------------------------------------------------------
>
> Key: HAWQ-1094
> URL: https://issues.apache.org/jira/browse/HAWQ-1094
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Fault Tolerance
> Reporter: Ming LI
> Assignee: Ming LI
> Priority: Major
> Fix For: backlog
>
>
> I created a parquet table and inserted the following values into the table:
> {code}
> sr37228_repro=# select * from number;
> id
> ----
> 1
> 1
> 1
> 1
> 1
> (5 rows)
> {code}
> I then modified the data in two of the three blocks and tried reading the
> data again.
> {code}
> Modifying contents of internal table blocks...
> Found hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 in hdfs
> Modifying block
> /hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
> on 172.28.21.155
> block_script.sh
> 100% 228 0.2KB/s 00:00
> Modifying block
> /hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
> on 172.28.21.156
> block_script.sh
> 100% 228 0.2KB/s 00:00
> Running count query again, this time with bad data in two of the three blocks
> count | id
> -------+----------
> 1 | 0
> 2 | 1
> 1 | 16777216
> 1 | 16777217
> (4 rows)
> Checking Showing file health:
> Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
> Connecting to namenode via
> http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
> FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path
> /hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
> /hawq_default/16385/16543/17000/10 206 bytes, 1 block(s): OK
> 0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206
> repl=3
> [DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
>
> DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
>
> DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]
> Status: HEALTHY
> Total size: 206 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 1 (avg. block size 206 B)
> Minimally replicated blocks: 1 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 3.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 3
> Number of racks: 1
> FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
> {code}
> When setupBlockReader reads a bad block using the LocalBlockReader, the
> reader correctly detects a bad checksum.
> {code}
> 2016-09-26 13:02:09.267021
> PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource
> manager discovered local host IPv4 address
> 127.0.0.1",,,,,,,0,,"network_utils.c",210,
> 2016-09-26 13:02:09.267171
> PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource
> manager discovered local host IPv4 address
> 172.28.21.155",,,,,,,0,,"network_utils.c",210,
> 2016-09-26 13:02:16.239048
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping
> in memory mapping OidInMemHeapMapping",,,,,,"SET log_min_messages TO
> 'debug5'",0,,"cdbinmemheapam.c",293,
> 2016-09-26 13:02:16.239289
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransactionCommand",,,,,,"SET
> log_min_messages TO 'debug5'",0,,"postgres.c",3131,
> 2016-09-26 13:02:16.239435
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransaction",,,,,,"SET
> log_min_messages TO 'debug5'",0,,"xact.c",5103,
> 2016-09-26 13:02:16.239819
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","name:
> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 6227/1/0,
> nestlvl: 1, children: <>",,,,,,"SET log_min_messages TO
> 'debug5'",0,,"xact.c",5128,
> 2016-09-26 13:02:16.239978
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping
> in memory mapping OidInMemOnlyMapping",,,,,,"SET log_min_messages TO
> 'debug5'",0,,"cdbinmemheapam.c",293,
> 2016-09-26 13:02:25.600367
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,0,con143,,seg1,,,,,"DEBUG5","00000","First char: 'M'; gp_role =
> 'execute'.",,,,,,,0,,"postgres.c",4737,
> 2016-09-26 13:02:25.600639
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","Message type M
> received by from libpq, len = 1412",,,,,,,0,,"postgres.c",4813,
> 2016-09-26 13:02:25.600742
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG5","00000","MPP dispatched stmt
> from QD: explain analyze select * from number;.",,,,,,,0,,"postgres.c",4893,
> 2016-09-26 13:02:25.600847
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","SetupProcessIdentity:
> receive msg:
> ProcessIdentity_Begin_slice_1_idx_0_gang_1_cmd_74_writer_t_End_ProcessIdentity",,,,,,,0,,"identity.c",365,
> 2016-09-26 13:02:25.600997
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity is
> not init",,,,,,,0,,"identity.c",599,
> 2016-09-26 13:02:25.601129
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity:
> slice 1 id 0 gang num 1 writer t",,,,,,,0,,"identity.c",602,
> 2016-09-26 13:02:25.601250
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG5","00000","Get a temporary
> directory:/tmp/hawq/segment",,,,,,,0,,"cdbtmpdir.c",48,
> 2016-09-26 13:02:25.601351
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG1","00000","getLocalTmpDirFromSegmentConfig
> session_id:143 command_id:74 qeidx:0
> tmpdir:/tmp/hawq/segment",,,,,,,0,,"identity.c",418,
> 2016-09-26 13:02:25.601784
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG3","00000","StartTransactionCommand",,,,,,"explain
> analyze select * from number;",0,,"postgres.c",3107,
> 2016-09-26 13:02:25.602075
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","StartTransaction",,,,,,"explain
> analyze select * from number;",0,,"xact.c",5103,
> 2016-09-26 13:02:25.602195
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","name: unnamed;
> blockState: DEFAULT; state: INPROGR, xid/subid/cid: 6228/1/0, nestlvl:
> 1, children: <>",,,,,,"explain analyze select * from
> number;",0,,"xact.c",5128,
> 2016-09-26 13:02:25.602578
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 0 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602703
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 1 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602836
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 2 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.602994
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 3 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603104
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 4 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603211
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 5 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603317
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 6 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603572
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 7 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603751
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 8 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.603881
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 9 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604003
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 10 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604110
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 11 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604216
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 12 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604323
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 13 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604555
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 14 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604697
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 15 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604848
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 16 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.604959
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 17 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.605064
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
> index 18 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
> number;",0,,"cdbinmemheapam.c",624,
> 2016-09-26 13:02:25.605591
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","Resource
> enforcer finds cpu sub-system is disabled",,,,,,"explain analyze select *
> from number;",0,,"resourceenforcer.c",908,
> 2016-09-26 13:02:25.605716
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Current nice
> level of the process: 19",,,,,,"explain analyze select * from
> number;",0,,"postgres.c",283,
> 2016-09-26 13:02:25.605856
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Reniced
> process to level 19",,,,,,"explain analyze select * from
> number;",0,,"postgres.c",302,
> 2016-09-26 13:02:25.606073
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","GetSnapshotData
> setting globalxmin and xmin to 6228",,,,,,"explain analyze select * from
> number;",0,,"procarray.c",552,
> 2016-09-26 13:02:25.606306
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Inserted entry
> for query (sessionid=143, commandcnt=74)",,,,,,"explain analyze select * from
> number;",0,,"workfile_queryspace.c",283,
> 2016-09-26 13:02:25.606748
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Have
> both IPv6 and IPv4 choices",,,,,,"explain analyze select * from
> number;",0,,"ic_udp.c",1291,
> 2016-09-26 13:02:25.606978
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket
> ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select *
> from number;",0,,"ic_udp.c",1303,
> 2016-09-26 13:02:25.607098
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket
> 6 ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select *
> from number;",0,,"ic_udp.c",1307,
> 2016-09-26 13:02:25.607207
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","bind
> addrlen 28 fam 10",,,,,,"explain analyze select * from
> number;",0,,"ic_udp.c",1318,
> 2016-09-26 13:02:25.607320
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit
> default buffer size 124928 bytes",,,,,,"explain analyze select * from
> number;",0,,"ic_udp.c",2200,
> 2016-09-26 13:02:25.607555
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit
> use buffer size 2097152 bytes",,,,,,"explain analyze select * from
> number;",0,,"ic_udp.c",2215,
> 2016-09-26 13:02:25.607678
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit
> default buffer size 124928 bytes",,,,,,"explain analyze select * from
> number;",0,,"ic_udp.c",2200,
> 2016-09-26 13:02:25.607787
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit
> use buffer size 2097152 bytes",,,,,,"explain analyze select * from
> number;",0,,"ic_udp.c",2215,
> 2016-09-26 13:02:25.607939
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","GetSockAddr
> socket ai_family 2 ai_socktype 2 ai_protocol 17 for
> 172.28.21.157",,,,,,"explain analyze select * from
> number;",0,,"ic_udp.c",3058,
> 2016-09-26 13:02:25.608052
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","We
> are inet6, remote is inet. Converting to v4 mapped address.",,,,,,"explain
> analyze select * from number;",0,,"ic_udp.c",3137,
> 2016-09-26 13:02:25.608249
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
> index 0 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608706
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
> index 1 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608836
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
> index 2 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.608966
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
> index 3 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609083
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
> index 4 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609200
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
> index 5 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609316
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
> index 6 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.609657
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
> index 7 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
> from number;",0,,"cdbinmemheapam.c",499,
> 2016-09-26 13:02:25.613152
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
> 12:32:31
> PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","Parquet
> metadata file footer length index: 198",,,,,,"explain analyze select * from
> number;",0,,"cdbparquetfooterprocessor.c",141,
> 2016-09-26 13:02:25.676719
> PDT,,,p380675,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error
> log:
> 2016-09-26 13:02:25.676477, p384452, th140708219193472, ERROR cannot setup
> block reader for Block: [block pool ID:
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186] file
> /hawq_default/16385/16543/17000/10 on Datanode: hdw2.hdp.local(172.28.21.155).
> LocalBlockReader.cpp: 127: HdfsIOException: Failed to construct
> LocalBlockReader for block: [block pool ID:
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
> @
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
> const&, Hdfs::Internal::ExtendedBlock const&, long, bool,
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
> @ Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
> @ Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int,
> bool)
> @ Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
> @ Hdfs::Internal::InputStreamImpl::read(char*, int)
> @ hdfsRead
> @ gpfs_hdfs_read
> @ HdfsRead
> @ FileRead
> @ readParquetFooter
> @ ParquetStorageRead_OpenFile
> @ parquet_getnext
> @ ParquetScanNext
> @ ExecTableScan
> @ ExecProcNode
> @ ExecMotion
> @ ExecProcNode
> @ ExecutePlan
> @ ExecutorRun
> @ PortalRunSelect
> @ PortalRun
> @ PostgresMain
> @ BackendStartup
> @ ServerLoop
> @ PostmasterMain
> @ main
> @ __libc_start_main
> @ Unknown
> Caused by
> LocalBlockReader.cpp: 283: HdfsIOException: LocalBlockReader failed to skip
> from position: 0, length: 0, block: [block pool ID:
> BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
> @ Hdfs::Internal::LocalBlockReader::skip(long)
> @
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
> const&, Hdfs::Internal::ExtendedBlock const&, long, bool,
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
> @ Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
> @ Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int,
> bool)
> @ Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
> @ Hdfs::Internal::InputStreamImpl::read(char*, int)
> @ hdfsRead
> @ gpfs_hdfs_read
> @ HdfsRead
> @ FileRead
> @ readParquetFooter
> @ ParquetStorageRead_OpenFile
> @ parquet_getnext
> @ ParquetScanNext
> @ ExecTableScan
> @ ExecProcNode
> @ ExecMotion
> @ ExecProcNode
> @ ExecutePlan
> @ ExecutorRun
> @ PortalRunSelect
> @ PortalRun
> @ PostgresMain
> @ BackendStartup
> @ ServerLoop
> @ PostmasterMain
> @ main
> @ __libc_start_main
> @ Unknown
> Caused by
> LocalBlockReader.cpp: 156: ChecksumException: LocalBlockReader checksum not
> match for block: [block pool ID: BP-2023073008-172.28.21.63-1462922052672
> block ID 1073742008_1186]
> @ Hdfs::Internal::LocalBlockReader::readAndVerify(int)
> @ Hdfs::Internal::LocalBlockReader::skip(long)
> @
> Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
> const&, Hdfs::Internal::ExtendedBlock const&, long, bool,
> Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
> @ Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
> @ Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int,
> bool)
> @ Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
> @ Hdfs::Internal::InputStreamImpl::read(char*, int)
> @ hdfsRead
> @ gpfs_hdfs_read
> @ HdfsRead
> @ FileRead
> @ readParquetFooter
> @ ParquetStorageRead_OpenFile
> @ parquet_getnext
> @ ParquetScanNext
> @ ExecTableScan
> @ ExecProcNode
> @ ExecMotion
> @ ExecProcNode
> @ ExecutePlan
> @ ExecutorRun
> @ PortalRunSelect
> @ PortalRun
> @ PostgresMain
> @ BackendStartup
> @ ServerLoop
> @ PostmasterMain
> @ main
> @ __libc_start_main
> @ Unknown
> retry the same node but disable read shortcircuit
> feature",,,,,,,,"SysLoggerMain","syslogger.c",518,
> 2016-09-26 13:02:25.680638
> PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
>
> {code}
> Even though it correctly detected the bad checksum using the
> LocalBlockReader, when it calls the RemoteBlockReader it does not appear to
> detect the bad checksum, and the read is allowed to go through.
> {code}
> sr37228_repro=# select * from number;
> id
> ----------
> 16777217
> 16777216
> 0
> 1
> 1
> (5 rows)
> Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
> Connecting to namenode via
> http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
> FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path
> /hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
> /hawq_default/16385/16543/17000/10 206 bytes, 1 block(s): OK
> 0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206
> repl=3
> [DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
>
> DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
>
> DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]
> Status: HEALTHY
> Total size: 206 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 1 (avg. block size 206 B)
> Minimally replicated blocks: 1 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 3.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 3
> Number of racks: 1
> FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
> The filesystem under path '/hawq_default/16385/16543/17000/10' is HEALTHY
> {code}
> The behavior of InputStreamImpl::setupBlockReader appears to be to:
> 1. Attempt to read the block locally using LocalBlockReader
> 2. If the local block read fails, attempt to read the block from the next
> available node using RemoteBlockReader
> 3. Continue to read all the available blocks using RemoteBlockReader until we
> have no more blocks to read.
> In this case, the RemoteBlockReader appears to ignore the bad checksum in the
> block, and returns wrong results.
> Questions:
> 1. When we detect a bad checksum on the local block, why do we not mark the
> block as corrupt with the NameNode?
> 2. When we read the block using RemoteBlockReader, why doesn't it detect the
> bad block?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)