[ https://issues.apache.org/jira/browse/HAWQ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming LI resolved HAWQ-1345. --------------------------- Resolution: Fixed > Cannot connect to PSQL: FATAL: could not count blocks of relation > 1663/16508/1249: Not a directory > -------------------------------------------------------------------------------------------------- > > Key: HAWQ-1345 > URL: https://issues.apache.org/jira/browse/HAWQ-1345 > Project: Apache HAWQ > Issue Type: Bug > Components: Catalog > Affects Versions: 2.0.0.0-incubating > Reporter: Amy > Assignee: Ming LI > Fix For: backlog > > > Unable to connect to psql for current database. > We can access psql for template1 database but for current database we are > getting the following error: > {code} > #psql > psql: FATAL: could not count blocks of relation 1663/16508/1249: Not a > directory > {code} > When trying to failover to Standby and starting HAWQ Master we get the > following error again: > {code} > 2017-02-17 02:12:50.119207 > PST,,,p22482,th-1681897184,,,,0,,,seg-10000,,,,,"DEBUG1","00000","opening > ""pg_xlog/00000001000000050000001D"" for readin > g (log 5, seg 29)",,,,,,,0,,"xlog.c",3162, > 2017-02-17 02:12:50.176450 > PST,,,p22482,th-1681897184,,,,0,,,seg-10000,,,,,"FATAL","42809","could not > count blocks of relation 1663/16508/1249: Not > a directory",,,,,"xlog redo insert: rel 1663/16508/1249; tid 32682/85 > REDO PASS 3 @ 5/7669B838; LSN 5/7669E480: prev 5/76694C98; xid 825193; bkpb1: > Heap - insert: rel 1663/16508/1249; tid 32682/85",,0,,"smgr.c",1146," > Stack trace: > 1 0x8c5628 postgres errstart + 0x288 > 2 0x7ddfbc postgres smgrnblocks + 0x3c > 3 0x4fbdf8 postgres XLogReadBuffer + 0x18 > 4 0x4ea2c9 postgres <symbol not found> + 0x4ea2c9 > 5 0x4eaf47 postgres <symbol not found> + 0x4eaf47 > 6 0x4f8af3 postgres StartupXLOG_Pass3 + 0x153 > 7 0x4fb277 postgres StartupProcessMain + 0x187 > 8 0x557cd8 postgres AuxiliaryProcessMain + 0x478 > 9 0x793c40 postgres <symbol not found> + 0x793c40 > 10 0x798901 postgres <symbol not found> + 0x798901 > 11 0x79a8c9 postgres PostmasterMain + 0x759 > 12 0x4a4039 postgres main + 0x519 > 13 0x7f3b979e1d5d libc.so.6 __libc_start_main + 0xfd > 14 0x4a40b9 postgres <symbol not found> + 0x4a40b9 > " > {code} > On both Master and Standby, we can see that pg_attribute for current > database, file 1663/16508/1249 has reached 1GB in size: > {code} > [gpadmin@master]$pwd > /data/hawq/master > [gpadmin@master master]$ cd base > [gpadmin@master base]$ ls > 1 16386 16387 16508 > [gpadmin@master base]$ cd 16508 > [gpadmin@master 16508]$ ls -thrl 1249 > -rw------- 1 gpadmin gpadmin 1.0G Feb 16 18:24 1249 > {code} > From strace we were able to find the following: > {code} > [gpadmin@master master]$ strace /usr/local/hawq/bin/postgres --single -P -O > -p 5432 -D $MASTER_DATA_DIRECTORY -c gp_session_role=utility currentdatabase > <<EOF > select version(); > EOF > (...) > open("base/16508/pg_internal.init", O_RDONLY) = -1 ENOENT (No such file or > directory) > open("base/16508/1259", O_RDWR) = 6 > lseek(6, 0, SEEK_END) = 188645376 > lseek(6, 0, SEEK_SET) = 0 > read(6, > "\0\0\0\0\340\5\327\1\1\0\1\0\f\3@\3\0\200\4\2008\263P\1`\262\252\1\270\261P\1"..., > 32768) = 32768 > open("base/16508/1249", O_RDWR) = 8 > lseek(8, 0, SEEK_END) = 1073741824 > open("base/16508/1249/1", O_RDWR) = -1 ENOTDIR (Not a directory) > open("base/16508/1249/1", O_RDWR|O_CREAT, 0600) = -1 ENOTDIR (Not a directory) > futex(0x7ff80e53f620, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > futex(0x7ff80e756af0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > open("/usr/share/locale/locale.alias", O_RDONLY) = 10 > fstat(10, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0 > {code} > We see HAWQ is treating pg_attribute as a directory while it is a file. -- This message was sent by Atlassian JIRA (v6.3.15#6346)