[
https://issues.apache.org/jira/browse/HAWQ-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruilong Huo closed HAWQ-1324.
-
> Query cancel cause segment to go into Crash recovery
>
>
> Key: HAWQ-1324
> URL: https://issues.apache.org/jira/browse/HAWQ-1324
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Query Execution
>Affects Versions: 2.0.0.0-incubating
>Reporter: Ming LI
>Assignee: Ming LI
> Fix For: 2.1.0.0-incubating
>
>
> A query was cancelled due to this connection issue to HDFS on Isilon. Seg26
> then went into crash recovery due to a INSERT query being cancelled. What
> should be the expected behaviour when HDFS becomes unavailable and a Query
> fails due to HDFS unavailability.
> Below is the HDFS error
> {code}
> 2017-01-04 03:04:08.382615
> JST,"carund","dwhrun",p574246,th1862944896,"192.168.10.12","47554",2017-01-04
> 03:03:08 JST,0,con198952,,seg29,"FATAL","08006","connection to client
> lost",,,0,,"postgres.c",3518,
> 2017-01-04 03:04:08.420099
> JST,,,p755778,th18629448960,,,seg-1,"LOG","0","3rd party
> error log:
> 2017-01-04 03:04:08.419969, p574222, th140507423066240, ERROR Handle
> Exception: NamenodeImpl.cpp: 670: Unexpected error: status:
> STATUS_FILE_NOT_AVAILABLE = 0xC467 Path:
> hawq_default/16385/16563/802748/26 with path=
> ""/hawq_default/16385/16563/802748/26"",
> clientname=libhdfs3_client_random_866998528_count_1_pid_574222_tid_140507423066240
> @ Hdfs::Internal::UnWrapper Hdfs::HdfsIOException, Hdfs::Internal::Nothing, Hdfs::Internal::Nothing,
> Hdfs::Internal::Nothing, Hdfs::Internal::Nothing, Hdfs::Internal::Nothing ,
> Hdfs::Internal::Nothing, Hdfs::Internal::Nothing, Hdfs::Internal::Nothing,
> Hdfs::Internal::Nothing>::unwrap(char const, int)
> @ Hdfs::Internal::UnWrapper Hdfs::UnresolvedLinkException, Hdfs::HdfsIOException,
> Hdfs::Internal::Nothing, Hdfs::Internal::Nothing, Hdfs::Internal::Nothing,
> Hdfs::Internal::Not hing, Hdfs::Internal::Nothing, Hdfs::Internal::Nothing,
> Hdfs::Internal::Nothing, Hdfs::Internal::Nothing>::unwrap(char const, int)
> @ Hdfs::Internal::NamenodeImpl::fsync(std::string const&, std::string const&)
> @ Hdfs::Internal::NamenodeProxy::fsync(std::string const&, std::string const&)
> @ Hdfs::Internal::OutputStreamImpl::closePipeline()
> @ Hdfs::Internal::OutputStreamImpl::close()
> @ hdfsCloseFile
> @ gpfs_hdfs_closefile
> @ HdfsCloseFile
> @ HdfsFileClose
> @ CleanupTempFiles
> @ AbortTransaction
> @ AbortCurrentTransaction
> @ PostgresMain
> @ BackendStartup
> @ ServerLoop
> @ PostmasterMain
> @ main
> @ Unknown
> @ Unknown""SysLoggerMain","syslogger.c",518,
> 2017-01-04 03:04:08.420272
> JST,"carund","dwhrun",p574222,th1862944896,"192.168.10.12","47550",2017-01-04
> 03:03:08
> JST,40678725,con198952,cmd4,seg25,,,x40678725,sx1,"WARNING","58030","could
> not close file 7 : (hdfs://ffd
> lakehd.ffwin.fujifilm.co.jp:8020/hawq_default/16385/16563/802748/26) errno
> 5","Unexpected error: status: STATUS_FILE_NOT_AVAILABLE = 0xC467 Path:
> hawq_default/16385/16563/802748/26 with path=""/hawq_default/16385/16
> 563/802748/26"",
> clientname=libhdfs3_client_random_866998528_count_1_pid_574222_tid_140507423066240",,0,,"fd.c",2762,
> {code}
> Segment 26 going into Crash recovery - from seg26 log file
> {code}
> 2017-01-04 03:04:08.420314
> JST,"carund","dwhrun",p574222,th1862944896,"192.168.10.12","47550",2017-01-04
> 03:03:08
> JST,40678725,con198952,cmd4,seg25,,,x40678725,sx1,"LOG","08006","could not
> send data to client: 接続が相
> 手からリセットされました",,,0,,"pqcomm.c",1292,
> 2017-01-04 03:04:08.420358
> JST,"carund","dwhrun",p574222,th1862944896,"192.168.10.12","47550",2017-01-04
> 03:03:08 JST,0,con198952,,seg25,"LOG","08006","could not send data to
> client: パイプが切断されました",,,0,
> ,"pqcomm.c",1292,
> 2017-01-04 03:04:08.420375
> JST,"carund","dwhrun",p574222,th1862944896,"192.168.10.12","47550",2017-01-04
> 03:03:08 JST,0,con198952,,seg25,"FATAL","08006","connection to client
> lost",,,0,,"postgres.c",3518,
> 2017-01-04 03:04:08.950354
> JST,,,p755773,th18629448960,,,seg-1,"LOG","0","server process
> (PID 574240) was terminated by signal 11: Segmentation
> fault",,,0,,"postmaster.c",4748,
> 2017-01-04 03:04:08.950403
> JST,,,p755773,th18629448960,,,seg-1,"LOG","0","terminating
> any other active server processes",,,0,,"postmaster.c",4486,
> 2017-01-04 03:04:08.954044
> JST,,,p41605,th18629448960,,,seg-1,"LOG","0","Segment RM
> exits.",,,0,,"resourcemanager.c",340,
> 2017-01-04 03:04:08.954078
> JST,,,p41605,th18629448960,,,seg-1,"LOG","0","Clean up
> handler in message server is called.",,,0,,"rmcomm_MessageServer.c",105,
> 2017-01-04 03:04:08.972706