[jira] [Resolved] (HAWQ-637) RM process is error if property is missing in hawq-site.xml
[ https://issues.apache.org/jira/browse/HAWQ-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Wen resolved HAWQ-637. -- Resolution: Fixed > RM process is error if property is missing in hawq-site.xml > --- > > Key: HAWQ-637 > URL: https://issues.apache.org/jira/browse/HAWQ-637 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Lin Wen >Assignee: Lin Wen > > start hawq in yarn mode,yarn RM address is not configured in hawq-site.xml > scripts show start hawq successfully, but RM process is not correct. > ``` > gpadmin 235458 235448 0 08:48 ?00:00:00 postgres: port 5432, > master resource managercon4 error exit in 2m 0s > ``` > ``` > x86_64 libidn-1.18-2.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 > pam-1.1.1-13.el6.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x00366e6e14d3 in __select_nocancel () from /lib64/libc.so.6 > #1 0x00b885d0 in pg_usleep (microsec=3000) at pgsleep.c:43 > #2 0x009dd9d8 in elog_debug_linger (edata=0x117c6c0) at elog.c:4129 > #3 0x009d6047 in errfinish (dummy=0) at elog.c:597 > #4 0x009d86b4 in elog_finish (elevel=21, fmt=0xdb4de0 "YARN mode > resource broker failed to start resource broker process. error=%d") at > elog.c:1463 > #5 0x00a5e96b in RB_LIBYARN_start (isforked=1 '\001') at > resourcebroker_LIBYARN.c:96 > #6 0x00a5d924 in RB_start (isforked=1 '\001') at > resourcebroker_API.c:58 > #7 0x00a9417f in MainHandlerLoop () at resourcemanager.c:545 > #8 0x00a940d0 in ResManagerMainServer2ndPhase () at > resourcemanager.c:513 > #9 0x00a93b64 in ResManagerMain (argc=3, argv=0x7fffdefaa6f0) at > resourcemanager.c:332 > #10 0x00a93d72 in ResManagerProcessStartup () at resourcemanager.c:400 > #11 0x0089525f in CommenceNormalOperations () at postmaster.c:3673 > #12 0x00895c77 in do_reaper () at postmaster.c:4021 > #13 0x0089203b in ServerLoop () at postmaster.c:2136 > #14 0x008911ae in PostmasterMain (argc=9, argv=0x3407940) at > postmaster.c:1454 > #15 0x007aaf1a in main (argc=9, argv=0x3407940) at main.c:226 > The fix is to let RB and RM process work normally, but RB can't register > itself to Hadoop Yarn RM, since configure in hawk-site.xml is not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-637) RM process is error if property is missing in hawq-site.xml
[ https://issues.apache.org/jira/browse/HAWQ-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231684#comment-15231684 ] ASF GitHub Bot commented on HAWQ-637: - Github user linwen closed the pull request at: https://github.com/apache/incubator-hawq/pull/565 > RM process is error if property is missing in hawq-site.xml > --- > > Key: HAWQ-637 > URL: https://issues.apache.org/jira/browse/HAWQ-637 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Lin Wen >Assignee: Lin Wen > > start hawq in yarn mode,yarn RM address is not configured in hawq-site.xml > scripts show start hawq successfully, but RM process is not correct. > ``` > gpadmin 235458 235448 0 08:48 ?00:00:00 postgres: port 5432, > master resource managercon4 error exit in 2m 0s > ``` > ``` > x86_64 libidn-1.18-2.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 > pam-1.1.1-13.el6.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x00366e6e14d3 in __select_nocancel () from /lib64/libc.so.6 > #1 0x00b885d0 in pg_usleep (microsec=3000) at pgsleep.c:43 > #2 0x009dd9d8 in elog_debug_linger (edata=0x117c6c0) at elog.c:4129 > #3 0x009d6047 in errfinish (dummy=0) at elog.c:597 > #4 0x009d86b4 in elog_finish (elevel=21, fmt=0xdb4de0 "YARN mode > resource broker failed to start resource broker process. error=%d") at > elog.c:1463 > #5 0x00a5e96b in RB_LIBYARN_start (isforked=1 '\001') at > resourcebroker_LIBYARN.c:96 > #6 0x00a5d924 in RB_start (isforked=1 '\001') at > resourcebroker_API.c:58 > #7 0x00a9417f in MainHandlerLoop () at resourcemanager.c:545 > #8 0x00a940d0 in ResManagerMainServer2ndPhase () at > resourcemanager.c:513 > #9 0x00a93b64 in ResManagerMain (argc=3, argv=0x7fffdefaa6f0) at > resourcemanager.c:332 > #10 0x00a93d72 in ResManagerProcessStartup () at resourcemanager.c:400 > #11 0x0089525f in CommenceNormalOperations () at postmaster.c:3673 > #12 0x00895c77 in do_reaper () at postmaster.c:4021 > #13 0x0089203b in ServerLoop () at postmaster.c:2136 > #14 0x008911ae in PostmasterMain (argc=9, argv=0x3407940) at > postmaster.c:1454 > #15 0x007aaf1a in main (argc=9, argv=0x3407940) at main.c:226 > The fix is to let RB and RM process work normally, but RB can't register > itself to Hadoop Yarn RM, since configure in hawk-site.xml is not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-640) Wrongly timing the reqeusts for YARN containers causes resource manager not fully acquire resource
[ https://issues.apache.org/jira/browse/HAWQ-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Jin reassigned HAWQ-640: --- Assignee: Yi Jin (was: Lei Chang) > Wrongly timing the reqeusts for YARN containers causes resource manager not > fully acquire resource > -- > > Key: HAWQ-640 > URL: https://issues.apache.org/jira/browse/HAWQ-640 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Yi Jin >Assignee: Yi Jin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-640) Wrongly timing the reqeusts for YARN containers causes resource manager not fully acquire resource
Yi Jin created HAWQ-640: --- Summary: Wrongly timing the reqeusts for YARN containers causes resource manager not fully acquire resource Key: HAWQ-640 URL: https://issues.apache.org/jira/browse/HAWQ-640 Project: Apache HAWQ Issue Type: Bug Components: Resource Manager Reporter: Yi Jin Assignee: Lei Chang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAWQ-605) Some segment capacity changes are not logged out and when segment goes to up status, the capacity is not adjusted
[ https://issues.apache.org/jira/browse/HAWQ-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Jin resolved HAWQ-605. - Resolution: Fixed Fix Version/s: 2.0.0 > Some segment capacity changes are not logged out and when segment goes to up > status, the capacity is not adjusted > - > > Key: HAWQ-605 > URL: https://issues.apache.org/jira/browse/HAWQ-605 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Yi Jin >Assignee: Yi Jin > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-634) Let resource manager cancel waiting and return allocated resource when unregister connection rpc is called
[ https://issues.apache.org/jira/browse/HAWQ-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231615#comment-15231615 ] ASF GitHub Bot commented on HAWQ-634: - GitHub user jiny2 opened a pull request: https://github.com/apache/incubator-hawq/pull/568 HAWQ-634. Let resource manager cancel waiting and return allocated resource when unregister connection rpc is called You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiny2/incubator-hawq HAWQ-634 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/568.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #568 commit 3d1ddd1af78c1d7b0a33e4e6e6d707e1d3be4a30 Author: YI JINDate: 2016-04-08T04:27:42Z HAWQ-634. Let resource manager cancel waiting and return allocated resource when unregister connection rpc is called > Let resource manager cancel waiting and return allocated resource when > unregister connection rpc is called > -- > > Key: HAWQ-634 > URL: https://issues.apache.org/jira/browse/HAWQ-634 > Project: Apache HAWQ > Issue Type: Improvement > Components: Resource Manager >Reporter: Yi Jin >Assignee: Yi Jin > > This is an improvement to help quickly recycle resourced used by a cancelled > query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-637) RM process is error if property is missing in hawq-site.xml
[ https://issues.apache.org/jira/browse/HAWQ-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231511#comment-15231511 ] ASF GitHub Bot commented on HAWQ-637: - Github user jiny2 commented on the pull request: https://github.com/apache/incubator-hawq/pull/565#issuecomment-207179261 LGTM +1 > RM process is error if property is missing in hawq-site.xml > --- > > Key: HAWQ-637 > URL: https://issues.apache.org/jira/browse/HAWQ-637 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Lin Wen >Assignee: Lin Wen > > start hawq in yarn mode,yarn RM address is not configured in hawq-site.xml > scripts show start hawq successfully, but RM process is not correct. > ``` > gpadmin 235458 235448 0 08:48 ?00:00:00 postgres: port 5432, > master resource managercon4 error exit in 2m 0s > ``` > ``` > x86_64 libidn-1.18-2.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 > pam-1.1.1-13.el6.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x00366e6e14d3 in __select_nocancel () from /lib64/libc.so.6 > #1 0x00b885d0 in pg_usleep (microsec=3000) at pgsleep.c:43 > #2 0x009dd9d8 in elog_debug_linger (edata=0x117c6c0) at elog.c:4129 > #3 0x009d6047 in errfinish (dummy=0) at elog.c:597 > #4 0x009d86b4 in elog_finish (elevel=21, fmt=0xdb4de0 "YARN mode > resource broker failed to start resource broker process. error=%d") at > elog.c:1463 > #5 0x00a5e96b in RB_LIBYARN_start (isforked=1 '\001') at > resourcebroker_LIBYARN.c:96 > #6 0x00a5d924 in RB_start (isforked=1 '\001') at > resourcebroker_API.c:58 > #7 0x00a9417f in MainHandlerLoop () at resourcemanager.c:545 > #8 0x00a940d0 in ResManagerMainServer2ndPhase () at > resourcemanager.c:513 > #9 0x00a93b64 in ResManagerMain (argc=3, argv=0x7fffdefaa6f0) at > resourcemanager.c:332 > #10 0x00a93d72 in ResManagerProcessStartup () at resourcemanager.c:400 > #11 0x0089525f in CommenceNormalOperations () at postmaster.c:3673 > #12 0x00895c77 in do_reaper () at postmaster.c:4021 > #13 0x0089203b in ServerLoop () at postmaster.c:2136 > #14 0x008911ae in PostmasterMain (argc=9, argv=0x3407940) at > postmaster.c:1454 > #15 0x007aaf1a in main (argc=9, argv=0x3407940) at main.c:226 > The fix is to let RB and RM process work normally, but RB can't register > itself to Hadoop Yarn RM, since configure in hawk-site.xml is not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-636) hawq check should read configurations in /etc/security/limits.d
[ https://issues.apache.org/jira/browse/HAWQ-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231504#comment-15231504 ] ASF GitHub Bot commented on HAWQ-636: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq/pull/566 > hawq check should read configurations in /etc/security/limits.d > --- > > Key: HAWQ-636 > URL: https://issues.apache.org/jira/browse/HAWQ-636 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Radar Lei >Assignee: Radar Lei > > Currently 'hawq check' only check configurations from > /etc/security/limits.conf. > We should also include the configuration files inside /etc/security/limits.d/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-636) hawq check should read configurations in /etc/security/limits.d
[ https://issues.apache.org/jira/browse/HAWQ-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231483#comment-15231483 ] ASF GitHub Bot commented on HAWQ-636: - Github user yaoj2 commented on the pull request: https://github.com/apache/incubator-hawq/pull/566#issuecomment-207168475 +1 > hawq check should read configurations in /etc/security/limits.d > --- > > Key: HAWQ-636 > URL: https://issues.apache.org/jira/browse/HAWQ-636 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Radar Lei >Assignee: Radar Lei > > Currently 'hawq check' only check configurations from > /etc/security/limits.conf. > We should also include the configuration files inside /etc/security/limits.d/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-636) hawq check should read configurations in /etc/security/limits.d
[ https://issues.apache.org/jira/browse/HAWQ-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231479#comment-15231479 ] ASF GitHub Bot commented on HAWQ-636: - Github user huor commented on the pull request: https://github.com/apache/incubator-hawq/pull/566#issuecomment-207167202 +1 > hawq check should read configurations in /etc/security/limits.d > --- > > Key: HAWQ-636 > URL: https://issues.apache.org/jira/browse/HAWQ-636 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Radar Lei >Assignee: Radar Lei > > Currently 'hawq check' only check configurations from > /etc/security/limits.conf. > We should also include the configuration files inside /etc/security/limits.d/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-628) \l+ returns hcatalog not supported error
[ https://issues.apache.org/jira/browse/HAWQ-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-628: --- Description: This is likely not from HAWQ client {code} gpadmin=# \l+ ERROR: database hcatalog (OID 6120) is reserved (dbsize.c:185) gpadmin=# \l List of databases Name| Owner | Encoding | Access privileges ---+-+--+--- gpadmin | gpadmin | UTF8 | hcatalog | gpadmin | UTF8 | lc_demo | gpadmin | UTF8 | postgres | gpadmin | UTF8 | template0 | gpadmin | UTF8 | template1 | gpadmin | UTF8 | (6 rows) {code} *Expected Behavior* show hcatalog under \l+ normally with as much information as we can provide was: {code} gpadmin=# \l+ ERROR: database hcatalog (OID 6120) is reserved (dbsize.c:185) gpadmin=# \l List of databases Name| Owner | Encoding | Access privileges ---+-+--+--- gpadmin | gpadmin | UTF8 | hcatalog | gpadmin | UTF8 | lc_demo | gpadmin | UTF8 | postgres | gpadmin | UTF8 | template0 | gpadmin | UTF8 | template1 | gpadmin | UTF8 | (6 rows) {code} *Expected Behavior* show hcatalog under \l+ normally with as much information as we can provide > \l+ returns hcatalog not supported error > > > Key: HAWQ-628 > URL: https://issues.apache.org/jira/browse/HAWQ-628 > Project: Apache HAWQ > Issue Type: Bug > Components: Hcatalog, PXF >Reporter: Goden Yao >Assignee: Goden Yao > > This is likely not from HAWQ client > {code} > gpadmin=# \l+ > ERROR: database hcatalog (OID 6120) is reserved (dbsize.c:185) > gpadmin=# \l > List of databases >Name| Owner | Encoding | Access privileges > ---+-+--+--- > gpadmin | gpadmin | UTF8 | > hcatalog | gpadmin | UTF8 | > lc_demo | gpadmin | UTF8 | > postgres | gpadmin | UTF8 | > template0 | gpadmin | UTF8 | > template1 | gpadmin | UTF8 | > (6 rows) > {code} > *Expected Behavior* > show hcatalog under \l+ normally with as much information as we can provide -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-639) HAWQ support ORC as a native storage format
Goden Yao created HAWQ-639: -- Summary: HAWQ support ORC as a native storage format Key: HAWQ-639 URL: https://issues.apache.org/jira/browse/HAWQ-639 Project: Apache HAWQ Issue Type: New Feature Components: Storage Reporter: Goden Yao Assignee: Lei Chang As a user, I want to be able to store my table as ORC format in HAWQ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-633) Do not error out when cleaning up workfiles during AbortTransaction
[ https://issues.apache.org/jira/browse/HAWQ-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230521#comment-15230521 ] ASF GitHub Bot commented on HAWQ-633: - Github user gcaragea closed the pull request at: https://github.com/apache/incubator-hawq/pull/562 > Do not error out when cleaning up workfiles during AbortTransaction > --- > > Key: HAWQ-633 > URL: https://issues.apache.org/jira/browse/HAWQ-633 > Project: Apache HAWQ > Issue Type: Bug > Components: Query Execution >Reporter: George Caragea >Assignee: George Caragea > > If we reach out of disk space when creating temporary spill files, we will > error out during writing to disk, and abort the transaction. > When aborting the transaction, part of the cleanup code we call > workfile_mgr_unlink_directory() to delete the directory containing all the > work files. But in some cases that directory might not even be created, > because of the out of disk space. > Instead of erroring out again, just give a warning and continue with the > abort code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-638) gpload bug using pip installed PyGreSQL
hongwu created HAWQ-638: --- Summary: gpload bug using pip installed PyGreSQL Key: HAWQ-638 URL: https://issues.apache.org/jira/browse/HAWQ-638 Project: Apache HAWQ Issue Type: Bug Components: Command Line Tools Reporter: hongwu Assignee: Lei Chang Since greenplum's gpload is based on private patch upon PyGreSQL for internal usage, and gpload.py is copied from greenplum incompletely, it will generate error while using gpload tools. Details: self.db.notices() depends on the implementation of pg.DB.notices, which was implemented internal in greenplum, it is wrong to use this attribute in gpload tool of hawq. Reference: https://github.com/apache/incubator-hawq/blob/master/tools/bin/gpload.py#L704 https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/bin/pythonSrc/PyGreSQL-4.0/pgmodule.c#L2929 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-638) gpload bug using pip installed PyGreSQL
[ https://issues.apache.org/jira/browse/HAWQ-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230281#comment-15230281 ] ASF GitHub Bot commented on HAWQ-638: - Github user xunzhang commented on the pull request: https://github.com/apache/incubator-hawq/pull/567#issuecomment-206923112 cc @radarwave ps: I am not sure why Jenkins could not merge into master. > gpload bug using pip installed PyGreSQL > --- > > Key: HAWQ-638 > URL: https://issues.apache.org/jira/browse/HAWQ-638 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: hongwu >Assignee: Lei Chang > > Since greenplum's gpload is based on private patch upon PyGreSQL for internal > usage, and gpload.py is copied from greenplum incompletely, it will generate > error while using gpload tools. > Details: > self.db.notices() depends on the implementation of pg.DB.notices, which was > implemented internal in greenplum, it is wrong to use this attribute in > gpload tool of hawq. > Reference: > https://github.com/apache/incubator-hawq/blob/master/tools/bin/gpload.py#L704 > https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/bin/pythonSrc/PyGreSQL-4.0/pgmodule.c#L2929 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-638) gpload bug using pip installed PyGreSQL
[ https://issues.apache.org/jira/browse/HAWQ-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230276#comment-15230276 ] ASF GitHub Bot commented on HAWQ-638: - GitHub user xunzhang opened a pull request: https://github.com/apache/incubator-hawq/pull/567 HAWQ-638. gpload bugfix using pip installed PyGreSQL See https://issues.apache.org/jira/browse/HAWQ-638 for detailed info. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xunzhang/incubator-hawq hotfix-gpload Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/567.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #567 commit 29a3c439eb8e426d79d3c1e71f37c0d84f3792da Author: xunzhangDate: 2016-04-07T14:01:59Z HAWQ-638. gpload bugfix using pip installed PyGreSQL > gpload bug using pip installed PyGreSQL > --- > > Key: HAWQ-638 > URL: https://issues.apache.org/jira/browse/HAWQ-638 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: hongwu >Assignee: Lei Chang > > Since greenplum's gpload is based on private patch upon PyGreSQL for internal > usage, and gpload.py is copied from greenplum incompletely, it will generate > error while using gpload tools. > Details: > self.db.notices() depends on the implementation of pg.DB.notices, which was > implemented internal in greenplum, it is wrong to use this attribute in > gpload tool of hawq. > Reference: > https://github.com/apache/incubator-hawq/blob/master/tools/bin/gpload.py#L704 > https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/bin/pythonSrc/PyGreSQL-4.0/pgmodule.c#L2929 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-636) hawq check should read configurations in /etc/security/limits.d
[ https://issues.apache.org/jira/browse/HAWQ-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230188#comment-15230188 ] ASF GitHub Bot commented on HAWQ-636: - GitHub user radarwave opened a pull request: https://github.com/apache/incubator-hawq/pull/566 HAWQ-636. hawq check include /etc/security/limits.d Tested config files in '/etc/security/limits.d', it works well with different config file and different user if defined user domain. You can merge this pull request into a Git repository by running: $ git pull https://github.com/radarwave/incubator-hawq HAWQ-636 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/566.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #566 commit f8fe2cf88e0b529f2a48657e4c7c14b515580e7b Author: rleiDate: 2016-04-07T10:16:42Z HAWQ-636. hawq check include /etc/security/limits.d > hawq check should read configurations in /etc/security/limits.d > --- > > Key: HAWQ-636 > URL: https://issues.apache.org/jira/browse/HAWQ-636 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Radar Lei >Assignee: Radar Lei > > Currently 'hawq check' only check configurations from > /etc/security/limits.conf. > We should also include the configuration files inside /etc/security/limits.d/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-637) RM process is error if property is missing in hawq-site.xml
[ https://issues.apache.org/jira/browse/HAWQ-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230060#comment-15230060 ] ASF GitHub Bot commented on HAWQ-637: - GitHub user linwen opened a pull request: https://github.com/apache/incubator-hawq/pull/565 HAWQ-637. Fix RM process error if property is missing or invalid in h… Please review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/linwen/incubator-hawq hawq-637 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/565.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #565 commit fda5929b784bcd39458dc6b0dec9d7617110dc38 Author: Wen LinDate: 2016-04-07T10:30:42Z HAWQ-637. Fix RM process error if property is missing or invalid in hawq-site.xml with Yarn mode. > RM process is error if property is missing in hawq-site.xml > --- > > Key: HAWQ-637 > URL: https://issues.apache.org/jira/browse/HAWQ-637 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Lin Wen >Assignee: Lin Wen > > start hawq in yarn mode,yarn RM address is not configured in hawq-site.xml > scripts show start hawq successfully, but RM process is not correct. > ``` > gpadmin 235458 235448 0 08:48 ?00:00:00 postgres: port 5432, > master resource managercon4 error exit in 2m 0s > ``` > ``` > x86_64 libidn-1.18-2.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 > pam-1.1.1-13.el6.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x00366e6e14d3 in __select_nocancel () from /lib64/libc.so.6 > #1 0x00b885d0 in pg_usleep (microsec=3000) at pgsleep.c:43 > #2 0x009dd9d8 in elog_debug_linger (edata=0x117c6c0) at elog.c:4129 > #3 0x009d6047 in errfinish (dummy=0) at elog.c:597 > #4 0x009d86b4 in elog_finish (elevel=21, fmt=0xdb4de0 "YARN mode > resource broker failed to start resource broker process. error=%d") at > elog.c:1463 > #5 0x00a5e96b in RB_LIBYARN_start (isforked=1 '\001') at > resourcebroker_LIBYARN.c:96 > #6 0x00a5d924 in RB_start (isforked=1 '\001') at > resourcebroker_API.c:58 > #7 0x00a9417f in MainHandlerLoop () at resourcemanager.c:545 > #8 0x00a940d0 in ResManagerMainServer2ndPhase () at > resourcemanager.c:513 > #9 0x00a93b64 in ResManagerMain (argc=3, argv=0x7fffdefaa6f0) at > resourcemanager.c:332 > #10 0x00a93d72 in ResManagerProcessStartup () at resourcemanager.c:400 > #11 0x0089525f in CommenceNormalOperations () at postmaster.c:3673 > #12 0x00895c77 in do_reaper () at postmaster.c:4021 > #13 0x0089203b in ServerLoop () at postmaster.c:2136 > #14 0x008911ae in PostmasterMain (argc=9, argv=0x3407940) at > postmaster.c:1454 > #15 0x007aaf1a in main (argc=9, argv=0x3407940) at main.c:226 > The fix is to let RB and RM process work normally, but RB can't register > itself to Hadoop Yarn RM, since configure in hawk-site.xml is not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-637) RM process is error if property is missing in hawq-site.xml
[ https://issues.apache.org/jira/browse/HAWQ-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Wen reassigned HAWQ-637: Assignee: Lin Wen (was: Lei Chang) > RM process is error if property is missing in hawq-site.xml > --- > > Key: HAWQ-637 > URL: https://issues.apache.org/jira/browse/HAWQ-637 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Lin Wen >Assignee: Lin Wen > > start hawq in yarn mode,yarn RM address is not configured in hawq-site.xml > scripts show start hawq successfully, but RM process is not correct. > ``` > gpadmin 235458 235448 0 08:48 ?00:00:00 postgres: port 5432, > master resource managercon4 error exit in 2m 0s > ``` > ``` > x86_64 libidn-1.18-2.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 > pam-1.1.1-13.el6.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x00366e6e14d3 in __select_nocancel () from /lib64/libc.so.6 > #1 0x00b885d0 in pg_usleep (microsec=3000) at pgsleep.c:43 > #2 0x009dd9d8 in elog_debug_linger (edata=0x117c6c0) at elog.c:4129 > #3 0x009d6047 in errfinish (dummy=0) at elog.c:597 > #4 0x009d86b4 in elog_finish (elevel=21, fmt=0xdb4de0 "YARN mode > resource broker failed to start resource broker process. error=%d") at > elog.c:1463 > #5 0x00a5e96b in RB_LIBYARN_start (isforked=1 '\001') at > resourcebroker_LIBYARN.c:96 > #6 0x00a5d924 in RB_start (isforked=1 '\001') at > resourcebroker_API.c:58 > #7 0x00a9417f in MainHandlerLoop () at resourcemanager.c:545 > #8 0x00a940d0 in ResManagerMainServer2ndPhase () at > resourcemanager.c:513 > #9 0x00a93b64 in ResManagerMain (argc=3, argv=0x7fffdefaa6f0) at > resourcemanager.c:332 > #10 0x00a93d72 in ResManagerProcessStartup () at resourcemanager.c:400 > #11 0x0089525f in CommenceNormalOperations () at postmaster.c:3673 > #12 0x00895c77 in do_reaper () at postmaster.c:4021 > #13 0x0089203b in ServerLoop () at postmaster.c:2136 > #14 0x008911ae in PostmasterMain (argc=9, argv=0x3407940) at > postmaster.c:1454 > #15 0x007aaf1a in main (argc=9, argv=0x3407940) at main.c:226 > The fix is to let RB and RM process work normally, but RB can't register > itself to Hadoop Yarn RM, since configure in hawk-site.xml is not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-637) RM process is error if property is missing in hawq-site.xml
Lin Wen created HAWQ-637: Summary: RM process is error if property is missing in hawq-site.xml Key: HAWQ-637 URL: https://issues.apache.org/jira/browse/HAWQ-637 Project: Apache HAWQ Issue Type: Bug Components: Resource Manager Reporter: Lin Wen Assignee: Lei Chang start hawq in yarn mode,yarn RM address is not configured in hawq-site.xml scripts show start hawq successfully, but RM process is not correct. ``` gpadmin 235458 235448 0 08:48 ?00:00:00 postgres: port 5432, master resource managercon4 error exit in 2m 0s ``` ``` x86_64 libidn-1.18-2.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 pam-1.1.1-13.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x00366e6e14d3 in __select_nocancel () from /lib64/libc.so.6 #1 0x00b885d0 in pg_usleep (microsec=3000) at pgsleep.c:43 #2 0x009dd9d8 in elog_debug_linger (edata=0x117c6c0) at elog.c:4129 #3 0x009d6047 in errfinish (dummy=0) at elog.c:597 #4 0x009d86b4 in elog_finish (elevel=21, fmt=0xdb4de0 "YARN mode resource broker failed to start resource broker process. error=%d") at elog.c:1463 #5 0x00a5e96b in RB_LIBYARN_start (isforked=1 '\001') at resourcebroker_LIBYARN.c:96 #6 0x00a5d924 in RB_start (isforked=1 '\001') at resourcebroker_API.c:58 #7 0x00a9417f in MainHandlerLoop () at resourcemanager.c:545 #8 0x00a940d0 in ResManagerMainServer2ndPhase () at resourcemanager.c:513 #9 0x00a93b64 in ResManagerMain (argc=3, argv=0x7fffdefaa6f0) at resourcemanager.c:332 #10 0x00a93d72 in ResManagerProcessStartup () at resourcemanager.c:400 #11 0x0089525f in CommenceNormalOperations () at postmaster.c:3673 #12 0x00895c77 in do_reaper () at postmaster.c:4021 #13 0x0089203b in ServerLoop () at postmaster.c:2136 #14 0x008911ae in PostmasterMain (argc=9, argv=0x3407940) at postmaster.c:1454 #15 0x007aaf1a in main (argc=9, argv=0x3407940) at main.c:226 The fix is to let RB and RM process work normally, but RB can't register itself to Hadoop Yarn RM, since configure in hawk-site.xml is not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-635) QE process does not exit in libhdfs
[ https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming LI reassigned HAWQ-635: Assignee: Ming LI (was: Lei Chang) > QE process does not exit in libhdfs > --- > > Key: HAWQ-635 > URL: https://issues.apache.org/jira/browse/HAWQ-635 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Ming LI >Assignee: Ming LI > > The QE process cannot exit. > The calling stack is: > [gpadmin@sdw3 ~]$ pstack 489333 > #0 0x0033f560ef3d in nanosleep () from /lib64/libpthread.so.0 > #1 0x7ff75309c74a in boost::this_thread::hiden::sleep_for(timespec > const&) () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 > #2 0x7ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #3 0x7ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #4 0x7ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #5 0x00540c59 in boost::detail::shared_count::~shared_count() () > #6 0x0033f52361bd in __cxa_finalize () from /lib64/libc.so.6 > #7 0x7ff755b04456 in __do_global_dtors_aux () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #8 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAWQ-635) QE process does not exit in libhdfs
[ https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming LI resolved HAWQ-635. -- Resolution: Fixed > QE process does not exit in libhdfs > --- > > Key: HAWQ-635 > URL: https://issues.apache.org/jira/browse/HAWQ-635 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Ming LI >Assignee: Lei Chang > > The QE process cannot exit. > The calling stack is: > [gpadmin@sdw3 ~]$ pstack 489333 > #0 0x0033f560ef3d in nanosleep () from /lib64/libpthread.so.0 > #1 0x7ff75309c74a in boost::this_thread::hiden::sleep_for(timespec > const&) () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 > #2 0x7ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #3 0x7ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #4 0x7ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #5 0x00540c59 in boost::detail::shared_count::~shared_count() () > #6 0x0033f52361bd in __cxa_finalize () from /lib64/libc.so.6 > #7 0x7ff755b04456 in __do_global_dtors_aux () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #8 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-635) QE process does not exit in libhdfs
[ https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230048#comment-15230048 ] ASF GitHub Bot commented on HAWQ-635: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq/pull/564 > QE process does not exit in libhdfs > --- > > Key: HAWQ-635 > URL: https://issues.apache.org/jira/browse/HAWQ-635 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Ming LI >Assignee: Lei Chang > > The QE process cannot exit. > The calling stack is: > [gpadmin@sdw3 ~]$ pstack 489333 > #0 0x0033f560ef3d in nanosleep () from /lib64/libpthread.so.0 > #1 0x7ff75309c74a in boost::this_thread::hiden::sleep_for(timespec > const&) () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 > #2 0x7ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #3 0x7ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #4 0x7ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #5 0x00540c59 in boost::detail::shared_count::~shared_count() () > #6 0x0033f52361bd in __cxa_finalize () from /lib64/libc.so.6 > #7 0x7ff755b04456 in __do_global_dtors_aux () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #8 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-635) QE process does not exit in libhdfs
[ https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230042#comment-15230042 ] ASF GitHub Bot commented on HAWQ-635: - Github user liming01 commented on the pull request: https://github.com/apache/incubator-hawq/pull/564#issuecomment-206797814 @wangzw , yes, I agree with you. Thanks. > QE process does not exit in libhdfs > --- > > Key: HAWQ-635 > URL: https://issues.apache.org/jira/browse/HAWQ-635 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Ming LI >Assignee: Lei Chang > > The QE process cannot exit. > The calling stack is: > [gpadmin@sdw3 ~]$ pstack 489333 > #0 0x0033f560ef3d in nanosleep () from /lib64/libpthread.so.0 > #1 0x7ff75309c74a in boost::this_thread::hiden::sleep_for(timespec > const&) () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 > #2 0x7ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #3 0x7ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #4 0x7ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #5 0x00540c59 in boost::detail::shared_count::~shared_count() () > #6 0x0033f52361bd in __cxa_finalize () from /lib64/libc.so.6 > #7 0x7ff755b04456 in __do_global_dtors_aux () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #8 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-636) hawq check should read configurations in /etc/security/limits.d
[ https://issues.apache.org/jira/browse/HAWQ-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radar Lei reassigned HAWQ-636: -- Assignee: Radar Lei (was: Lei Chang) > hawq check should read configurations in /etc/security/limits.d > --- > > Key: HAWQ-636 > URL: https://issues.apache.org/jira/browse/HAWQ-636 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Radar Lei >Assignee: Radar Lei > > Currently 'hawq check' only check configurations from > /etc/security/limits.conf. > We should also include the configuration files inside /etc/security/limits.d/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-636) hawq check should read configurations in /etc/security/limits.d
Radar Lei created HAWQ-636: -- Summary: hawq check should read configurations in /etc/security/limits.d Key: HAWQ-636 URL: https://issues.apache.org/jira/browse/HAWQ-636 Project: Apache HAWQ Issue Type: Improvement Components: Command Line Tools Reporter: Radar Lei Assignee: Lei Chang Currently 'hawq check' only check configurations from /etc/security/limits.conf. We should also include the configuration files inside /etc/security/limits.d/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-635) QE process does not exit in libhdfs
[ https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229958#comment-15229958 ] ASF GitHub Bot commented on HAWQ-635: - Github user wangzw commented on the pull request: https://github.com/apache/incubator-hawq/pull/564#issuecomment-206774175 If the issue is caused by the exception thrown by RpcClientImpl::getChannel(). The only chance is that the system run out of memory or fail to create thread. In either case, you will see error message "RpcClient failed to create a channel to" in HAWQ log. Please refer the code in RpcClientImpl::getChannel() try { ... rc->addRef(); if (!cleaning) { cleaning = true; if (cleaner.joinable()) { cleaner.join(); } CREATE_THREAD(cleaner, bind(::clean, this)); } } catch ... Isn't it much easier to fix this issue by moving ```rc->addRef();``` to the end of the ```try``` block? > QE process does not exit in libhdfs > --- > > Key: HAWQ-635 > URL: https://issues.apache.org/jira/browse/HAWQ-635 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Ming LI >Assignee: Lei Chang > > The QE process cannot exit. > The calling stack is: > [gpadmin@sdw3 ~]$ pstack 489333 > #0 0x0033f560ef3d in nanosleep () from /lib64/libpthread.so.0 > #1 0x7ff75309c74a in boost::this_thread::hiden::sleep_for(timespec > const&) () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 > #2 0x7ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #3 0x7ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #4 0x7ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #5 0x00540c59 in boost::detail::shared_count::~shared_count() () > #6 0x0033f52361bd in __cxa_finalize () from /lib64/libc.so.6 > #7 0x7ff755b04456 in __do_global_dtors_aux () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #8 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-635) QE process does not exit in libhdfs
[ https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229933#comment-15229933 ] ASF GitHub Bot commented on HAWQ-635: - Github user wangzw commented on the pull request: https://github.com/apache/incubator-hawq/pull/564#issuecomment-206763422 Sure, will do. > QE process does not exit in libhdfs > --- > > Key: HAWQ-635 > URL: https://issues.apache.org/jira/browse/HAWQ-635 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Ming LI >Assignee: Lei Chang > > The QE process cannot exit. > The calling stack is: > [gpadmin@sdw3 ~]$ pstack 489333 > #0 0x0033f560ef3d in nanosleep () from /lib64/libpthread.so.0 > #1 0x7ff75309c74a in boost::this_thread::hiden::sleep_for(timespec > const&) () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 > #2 0x7ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #3 0x7ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #4 0x7ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #5 0x00540c59 in boost::detail::shared_count::~shared_count() () > #6 0x0033f52361bd in __cxa_finalize () from /lib64/libc.so.6 > #7 0x7ff755b04456 in __do_global_dtors_aux () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #8 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-635) QE process does not exit in libhdfs
[ https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229915#comment-15229915 ] ASF GitHub Bot commented on HAWQ-635: - GitHub user liming01 opened a pull request: https://github.com/apache/incubator-hawq/pull/564 HAWQ-635. QE process does not exit in libhdfs 1) The problem is caused by the wrong refs in RpcChannelImpl class, so it run a dead loop when process exiting. I suspect it is called by the exception thrown by RpcClientImpl::getChannel(), which already addRef(), but doesn't call close() when exception occurs. 2) This problem cannot be fixed by SIGNAL, because the clean up process is already called when dead loop occurs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liming01/incubator-hawq mli/process_not_exit_libhdfs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/564.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #564 commit 3156d8a0fa4197ab154ac3fbc3f6f42aa1715dbf Author: Ming LIDate: 2016-04-07T08:32:39Z HAWQ-635. QE process does not exit in libhdfs > QE process does not exit in libhdfs > --- > > Key: HAWQ-635 > URL: https://issues.apache.org/jira/browse/HAWQ-635 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Ming LI >Assignee: Lei Chang > > The QE process cannot exit. > The calling stack is: > [gpadmin@sdw3 ~]$ pstack 489333 > #0 0x0033f560ef3d in nanosleep () from /lib64/libpthread.so.0 > #1 0x7ff75309c74a in boost::this_thread::hiden::sleep_for(timespec > const&) () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 > #2 0x7ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #3 0x7ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #4 0x7ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #5 0x00540c59 in boost::detail::shared_count::~shared_count() () > #6 0x0033f52361bd in __cxa_finalize () from /lib64/libc.so.6 > #7 0x7ff755b04456 in __do_global_dtors_aux () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #8 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-635) QE process does not exit in libhdfs
Ming LI created HAWQ-635: Summary: QE process does not exit in libhdfs Key: HAWQ-635 URL: https://issues.apache.org/jira/browse/HAWQ-635 Project: Apache HAWQ Issue Type: Bug Reporter: Ming LI Assignee: Lei Chang The QE process cannot exit. The calling stack is: [gpadmin@sdw3 ~]$ pstack 489333 #0 0x0033f560ef3d in nanosleep () from /lib64/libpthread.so.0 #1 0x7ff75309c74a in boost::this_thread::hiden::sleep_for(timespec const&) () from /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 #2 0x7ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () from /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 #3 0x7ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 #4 0x7ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () from /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 #5 0x00540c59 in boost::detail::shared_count::~shared_count() () #6 0x0033f52361bd in __cxa_finalize () from /lib64/libc.so.6 #7 0x7ff755b04456 in __do_global_dtors_aux () from /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 #8 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-614) Table with Segment Reject Limit fails to flush AO file when all data is rejected
[ https://issues.apache.org/jira/browse/HAWQ-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229848#comment-15229848 ] hongwu commented on HAWQ-614: - Yes, because I try to replicate the scene with gpload(with a 2000 wrong lines of csv file, ERROR_LIMIT = 3000), it works well too. Please paste the gpload log file and your yml configure file. > Table with Segment Reject Limit fails to flush AO file when all data is > rejected > > > Key: HAWQ-614 > URL: https://issues.apache.org/jira/browse/HAWQ-614 > Project: Apache HAWQ > Issue Type: Bug > Components: External Tables >Reporter: Kyle R Dunn >Assignee: hongwu >Priority: Minor > Attachments: image008.jpg > > > An error message (attached) is received if *all* data gets rejected (for any > reason) when using segment reject limit option with an error table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)