On Fri, Mar 16, 2012 at 8:34 PM, Greg Stark <st...@mit.edu> wrote: > On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis <pg...@j-davis.com> wrote: > > There is a lot of difference between those two. In particular, it looks > > like the problem you are seeing is coming from the background writer, > > which is not running during initdb. > > The difference that comes to mind is that the postmaster forks. If the > library opens any connections prior to forking and then uses them > after forking that would work at first but it would get confused > quickly once more than one backend tries to use the same connection. > The data being sent would all be mixed together and they would see > responses to requests other processes sent. > You need to ensure that any network connections are opened up *after* > the new processes are forked. >
It's true.. it turned out that the reason of the problem is that HDFS has problems when dealing with forked processes.. However, there's no clear suggestion on how to fix this. I attached gdb to the writer process and got the following backtrace: #0 0xb76f0430 in __kernel_vsyscall () #1 0xb6b2893d in ___newselect_nocancel () at ../sysdeps/unix/syscall-template.S:82 #2 0x0840ab46 in pg_usleep (microsec=200000) at pgsleep.c:43 #3 0x0829ca9a in BgWriterNap () at bgwriter.c:642 #4 0x0829c882 in BackgroundWriterMain () at bgwriter.c:540 #5 0x0811b0ec in AuxiliaryProcessMain (argc=2, argv=0xbf982308) at bootstrap.c:417 #6 0x082a9af1 in StartChildProcess (type=BgWriterProcess) at postmaster.c:4427 #7 0x082a75de in reaper (postgres_signal_arg=17) at postmaster.c:2390 #8 <signal handler called> #9 0xb76f0430 in __kernel_vsyscall () #10 0xb6b2893d in ___newselect_nocancel () at ../sysdeps/unix/syscall-template.S:82 #11 0x082a5b62 in ServerLoop () at postmaster.c:1391 #12 0x082a53e2 in PostmasterMain (argc=3, argv=0xa525c28) at postmaster.c:1092 #13 0x0822dfa8 in main (argc=3, argv=0xa525c28) at main.c:188 Any ideas?