In upcoming changes to Jenkins automation, I will add the daily build downloads link to the daily-build test result email that gets sent to this list.
--Steve -----Original Message----- From: Roberta Marton [mailto:[email protected]] Sent: Thursday, October 8, 2015 9:51 AM To: [email protected] Subject: RE: trafodion won't start core files are generated Is this something that should be added to the Apache Trafodion website/wiki? Roberta -----Original Message----- From: Steve Varnau [mailto:[email protected]] Sent: Thursday, October 8, 2015 9:47 AM To: [email protected] Subject: RE: trafodion won't start core files are generated Daily builds for development/test are posted at http://traf-downloads.esgyn.com/ --Steve -----Original Message----- From: Suresh Subbiah [mailto:[email protected]] Sent: Thursday, October 8, 2015 7:10 AM To: [email protected] Subject: Re: trafodion won't start core files are generated Hi, What is the suggested procedure to pick up a daily build? Thanks Suresh On Thu, Oct 8, 2015 at 1:02 AM, Prashanth Vasudev < [email protected]> wrote: > Memorymonitor.cpp fix is part of this > https://issues.apache.org/jira/browse/TRAFODION-1492 > Please pick up latest daily build. > > Also max locked memory 64kb below appears very small. > > Regards, > Prashanth > > -----Original Message----- > From: Radu Marias [mailto:[email protected]] > Sent: Wednesday, October 7, 2015 8:45 AM > To: dev <[email protected]> > Subject: Re: trafodion won't start core files are generated > > Hi, > > I have these: > > # pwd > /dev/shm > # ls -la > total 4 > drwxrwxrwx 2 root root 60 Oct 6 21:07 . > drwxr-xr-x 9 root root 2180 Oct 2 22:28 .. > -rw-r--r-- 1 trafodion trafodion 32 Oct 6 21:07 > sem.monitor.sem.trafodion > > kernel.shmmax = 68719476736 > kernel.shmall = 4294967296 > > # ulimit -a > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1805076 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 65535 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 65535 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > I would try to reinstall trafodion to see it something got corrupted > and maybe that would fix the issue but I know there was a crash on > sqstart and one of your guys fixed it and copied the lib file to our > cluster: > > This is a response from Narendra in a previous thread where the issue > was fixed to start the trafodion: > > > > > > > > > > *I updated the code: sql/cli/memmonitor.cpp, so that if > > /proc/meminfo does not have the ‘Committed_AS’ entry, it will ignore > > it. Built it and put the > > binary: libcli.so on the veracity box (in the > > $MY_SQROOT/export/lib64 directory – on all the nodes). Restarted the > > env and ‘sqlci’ worked fine. > > Was able to ‘initialize trafodion’ and create a table.* > > > There was another one similar which I see it's closed > https://issues.apache.org/jira/browse/TRAFODION-1492 > > So the idea is are these fixes in the latest daily build and I can try > to reinstall? Or please send the changed files so I can override after > reinstall. > > On Wed, Oct 7, 2015 at 6:02 PM, Selva Govindarajan < > [email protected]> wrote: > > > You would want to retain the shared segment size across reboots. So, > > please check if the following settings are available in > > /etc/sysctl.conf > > > > # Controls the maximum shared segment size, in bytes kernel.shmmax = > > 134217728 > > > > # Controls the maximum number of shared memory segments, in pages > > kernel.shmall = 4294967296 > > > > > > shmmax needs to be at least 64 MB. By default, Trafodion RMS shared > > segment size is 64 MB. Trafodion RMS shared segment can be expanded > > to > > 128 MB. So, it is better to set shmmax to 128 mb, just in case we > > need to expand it later. > > > > Selva > > > > -----Original Message----- > > From: Prashanth Vasudev [mailto:[email protected]] > > Sent: Tuesday, October 6, 2015 2:19 PM > > To: [email protected] > > Subject: RE: trafodion won't start core files are generated > > > > Hi, > > From the stack trace below, it appears trafodion monitor is unable > > to create shared memory objects. > > Please makes sure ulimit settings on all nodes have high limits for > > max locked memory. > > Also make sure /dev/shm on all nodes have the correct write > > permissions to trafodion user id. > > > > Regards, > > Prashanth > > > > -----Original Message----- > > From: Radu Marias [mailto:[email protected]] > > Sent: Tuesday, October 6, 2015 9:21 AM > > To: dev <[email protected]> > > Subject: trafodion won't start core files are generated > > > > Hi, > > > > At some point a node from the 5 nodes cluster has stopped and we > > needed to restart it, After that I've restarted all the ambari and > > hdp services but trafodion fails to start. > > > > Bellow are some stack traces and details for files that I'm not > > getting any stack. Files are from node1 and node2 and were in Oct 2 > > (when I think node > > 2 was down) and Oct 6 (when re rebooted the node and tried to start > > trafodion). Feel free to connect and debug the issue on our cluster, > > Amanda has the credentials. > > > > *FROM NODE1* > > > > Oct 2 22:27 core.39347 > > core.39347: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > > SVR4-style, from 'tm SQMON1.1 00000 00000 039347 $TM0 > > 188.138.61.175:60186 00002 00000 > > 00009 SPAR' > > gdb /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv > > core.39347 > > no stack > > > > Oct 2 22:41 core.15144 > > Program terminated with signal 6, Aborted. > > #0 0x00007f77bcbbb625 in ?? () > > #1 0x00007f77bcbbce05 in ?? () > > #2 0x0000000000000010 in ?? () at ../common/Collections.cpp:109 > > #3 0x00007f77bee62130 in ?? () > > #4 0x00007ffe8e796ec0 in ?? () > > #5 0x00007f77bdeced00 in ?? () > > #6 0x0000000000000004 in ?? () at ../common/Collections.cpp:109 > > #7 0x0000000001b3a310 in ?? () > > #8 0x0000000000000000 in ?? () > > > > Oct 2 22:41 core.39240 > > #0 0x00007f534d03c625 in raise () from /lib64/libc.so.6 > > #1 0x00007f534d03de05 in abort () from /lib64/libc.so.6 > > #2 0x00007f534d03574e in __assert_fail_base () from > > /lib64/libc.so.6 > > #3 0x00007f534d035810 in __assert_fail () from /lib64/libc.so.6 > > #4 0x000000000046e213 in CExtTmLeaderReq::performRequest > > (this=0x7f53340008c0) at reqtmleader.cxx:126 > > #5 0x000000000045a64a in CReqWorker::reqWorkerThread (this=<value > > optimized > > out>) at reqworker.cxx:79 > > #6 0x000000000045a86d in reqWorker (arg=0xc6f9a0) at > > reqworker.cxx:147 > > #7 0x00007f534db45a51 in start_thread () from > > /lib64/libpthread.so.0 > > #8 0x00007f534d0f29ad in clone () from /lib64/libc.so.6 > > > > Oct 2 22:41 core.15309 > > core.15309: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > > SVR4-style, from 'tm SQMON1.1 00000 00000 015309 $TM0 > > 188.138.61.175:60186 00002 00000 > > 00134 SPAR' > > gdb /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv > > core.15309 > > no stack > > > > > > *FROM NODE2* > > > > Oct 2 22:29 core.39491 > > core.39491: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > > SVR4-style, from 'tm SQMON1.1 00001 00001 039491 $TM1 > > 188.138.61.177:38680 00002 00001 > > 00003 SPAR' > > gdb /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv > > core.39491 > > no stack > > > > Oct 6 15:23 core.1394 > > Program terminated with signal 6, Aborted. > > #0 0x00007fb97acbf625 in raise () from /lib64/libc.so.6 > > #1 0x00007fb97acc0e05 in abort () from /lib64/libc.so.6 > > #2 0x000000000041d07d in CProcessContainer::CProcessContainer > > (this=0x2071880, nodeContainer=<value optimized out>) at > > process.cxx:3366 > > #3 0x0000000000453f5c in CNode::CNode (this=0x2071880, > > name=0x204c448 "euve79672", pnid=0, rank=0) at pnode.cxx:153 > > #4 0x00000000004558e0 in CNodeContainer::AddNodes (this=<value > > optimized > > out>) at pnode.cxx:1564 > > #5 0x00000000004169a5 in CCluster::InitializeConfigCluster > > (this=0x20757b0) at cluster.cxx:2740 > > #6 0x0000000000417645 in CCluster::CCluster (this=0x20757b0) at > > cluster.cxx:567 > > #7 0x0000000000431e1a in CTmSync_Container::CTmSync_Container > > (this=0x20757b0) at tmsync.cxx:137 > > #8 0x0000000000407bb6 in CMonitor::CMonitor (this=0x20757b0, > > procTermSig=9) at monitor.cxx:323 > > #9 0x00000000004086ad in main (argc=2, argv=0x7fff8322e298) at > > monitor.cxx:1152 > > > > Oct 6 15:43 core.17626 > > Program terminated with signal 6, Aborted. > > #0 0x00007fcf11aea625 in raise () from /lib64/libc.so.6 > > #1 0x00007fcf11aebe05 in abort () from /lib64/libc.so.6 > > #2 0x000000000041d07d in CProcessContainer::CProcessContainer > > (this=0x1182890, nodeContainer=<value optimized out>) at > > process.cxx:3366 > > #3 0x0000000000453f5c in CNode::CNode (this=0x1182890, > > name=0x115d458 "euve79672", pnid=0, rank=0) at pnode.cxx:153 > > #4 0x00000000004558e0 in CNodeContainer::AddNodes (this=<value > > optimized > > out>) at pnode.cxx:1564 > > #5 0x00000000004169a5 in CCluster::InitializeConfigCluster > > (this=0x11867c0) at cluster.cxx:2740 > > #6 0x0000000000417645 in CCluster::CCluster (this=0x11867c0) at > > cluster.cxx:567 > > #7 0x0000000000431e1a in CTmSync_Container::CTmSync_Container > > (this=0x11867c0) at tmsync.cxx:137 > > #8 0x0000000000407bb6 in CMonitor::CMonitor (this=0x11867c0, > > procTermSig=9) at monitor.cxx:323 > > #9 0x00000000004086ad in main (argc=2, argv=0x7ffcaca91f68) at > > monitor.cxx:1152 > > > > -- > > And in the end, it's not the years in your life that count. It's the > > life in your years. > > > > > > -- > And in the end, it's not the years in your life that count. It's the > life in your years. >
