Did the steps mentioned above to ensure that the trafodion processes are
free of JAVA installation mixup.
Also changed so that hdp, trafodion and hammerdb uses the same jdk from
*/usr/jdk64/jdk1.7.0_67*

# java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

# echo $JAVA_HOME
/usr/jdk64/jdk1.7.0_67

But when running hammerdb I got again a crash on 2 nodes. I noticed that
before the crash for about one minute I'm getting errors for *java
-version* and
about 30 seconds after the crash the java -version worked again. So these
issues might be related. Didn't yet found the problem and how to fix the
java -version issue.

# java -version
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

# file core.5813
core.5813: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'tdm_udrserv SQMON1.1 00000 00000 005813 $Z0004R3 188.138.61.175:48357
00004 000'

#0  0x00007f6920ba0625 in raise () from /lib64/libc.so.6
#1  0x00007f6920ba1e05 in abort () from /lib64/libc.so.6
#2  0x0000000000424369 in comTFDS (msg1=0x43c070 "Trafodion UDR Server
Internal Error", msg2=<value optimized out>, msg3=0x7fff119787f0 "Source
file information unavailable",
    msg4=0x7fff11977ff0 "User routine being processed :
TRAFODION.TPCC.NEWORDER, Routine Type : Stored Procedure, Language Type :
JAVA, Error occurred outside the user routine code", msg5=0x43ddc3 "",
dialOut=<value optimized out>, writeToSeaLog=1) at
../udrserv/UdrFFDC.cpp:191
#3  0x00000000004245d7 in makeTFDSCall (msg=0x7f692324b310 "The Java
virtual machine aborted", file=<value optimized out>, line=<value optimized
out>, dialOut=1, writeToSeaLog=1) at ../udrserv/UdrFFDC.cpp:219
#4  0x00007f69232316b8 in LmJavaHooks::abortHookJVM () at
../langman/LmJavaHooks.cpp:54
#5  0x00007f69229cbbc6 in ParallelScavengeHeap::initialize() () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#6  0x00007f6922afedba in Universe::initialize_heap() () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#7  0x00007f6922afff89 in universe_init() () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#8  0x00007f692273d9f5 in init_globals() () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#9  0x00007f6922ae78ed in Threads::create_vm(JavaVMInitArgs*, bool*) ()
from /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#10 0x00007f69227c5a34 in JNI_CreateJavaVM () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#11 0x00007f692322de51 in LmLanguageManagerJava::initialize (this=<value
optimized out>, result=<value optimized out>, maxLMJava=<value optimized
out>, userOptions=0x7f69239ba418, diagsArea=<value optimized out>) at
../langman/LmLangManagerJava.cpp:379
#12 0x00007f692322f564 in LmLanguageManagerJava::LmLanguageManagerJava
(this=0x7f69239bec38, result=@0x7fff1197e19c, commandLineMode=<value
optimized out>, maxLMJava=1, userOptions=0x7f69239ba418,
diagsArea=0x7f6923991780) at ../langman/LmLangManagerJava.cpp:155
#13 0x0000000000425619 in UdrGlobals::getOrCreateJavaLM
(this=0x7f69239ba040, result=@0x7fff1197e19c, diags=<value optimized out>)
at ../udrserv/udrglobals.cpp:322
#14 0x0000000000427328 in processALoadMessage (UdrGlob=0x7f69239ba040,
msgStream=..., request=..., env=<value optimized out>) at
../udrserv/udrload.cpp:163
#15 0x000000000042fbfd in processARequest (UdrGlob=0x7f69239ba040,
msgStream=..., env=...) at ../udrserv/udrserv.cpp:660
#16 0x000000000043269c in runServer (argc=2, argv=0x7fff1197e528) at
../udrserv/udrserv.cpp:520
#17 0x000000000043294e in main (argc=2, argv=0x7fff1197e528) at
../udrserv/udrserv.cpp:356

On Wed, Sep 16, 2015 at 6:03 PM, Suresh Subbiah <[email protected]>
wrote:

> Hi,
>
> I have added a wiki page that describes how to get a stack trace from a
> core file. The page could do with some improvements on finding the core
> file and maybe even doing more than getting thestack trace. For now it
> should make our troubleshooting cycle faster if the stack trace is included
> in the initial message itself.
>
> https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+trace+from+a+core+file
>
> In this case, the last node does not seem to have gdb, so I could not see
> the trace there. I moved the core file to the first node but then the trace
> looks like this. I assume this is because I moved the core file to a
> different node. I think Selva's suggestion is good to try. We may have had
> a few  tdm_udrserv processes from before the time the java change was made.
>
> $ gdb tdm_udrserv core.49256
> #0  0x00007fe187a674fe in __longjmp () from /lib64/libc.so.6
> #1  0x8857780a58ff2155 in ?? ()
> Cannot access memory at address 0x8857780a58ff2155
>
> The back trace we saw yesterday when a udrserv process exited when JVM
> could not be started is used in the wiki page instead of this one. If you
> have time a JIRA on this unexpected udrserv exit will also be valuable for
> the Trafodion team.
>
> Thanks
> Suresh
>
> On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan <
> [email protected]> wrote:
>
> > Thanks for creating the JIRA Trafodion-1492.  The error is similar to
> > scenario-2. The process tdm_udrserv dumped core. We will look into the
> core
> > file. In the meantime, can you please do the following:
> >
> > Bring the Trafodion instance down
> > echo $MY_SQROOT -- shows Trafodion installation directory
> > Remove $MY_SQROOT/etc/ms.env from all nodes
> >
> >
> > Start a New Terminal Session so that new Java settings are in place
> > Login as a Trafodion user
> > cd <trafodion_installation_directory>
> > . ./sqenv.sh  (skip this if it is done automatically upon logon)
> > sqgen
> >
> > Exit and Start a New Terminal Session
> > Restart the Trafodion instance and check if you are seeing the issue with
> > tdm_udrserv again. We wanted to ensure that the trafodion processes are
> > free
> > of JAVA installation mixup in your earlier message. We suspect that can
> > cause tdm_udrserv process  to dump core.
> >
> >
> > Selva
> >
> > -----Original Message-----
> > From: Radu Marias [mailto:[email protected]]
> > Sent: Wednesday, September 16, 2015 5:40 AM
> > To: dev <[email protected]>
> > Subject: Re: odbc and/or hammerdb logs
> >
> > I'm seeing this in hammerdb logs, I assume is due to the crash and some
> > processes are stopped:
> >
> > Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion Database] SQL
> > ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error 201 while
> > communicating with server process $Z010LPE:23. [2015-09-16 12:35:33]
> > [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:*** ERROR[8904] SQL
> > did not receive a reply from MXUDR, possibly caused by internal errors
> when
> > executing user-defined routines. [2015-09-16 12:35:33]
> >
> > $ sqcheck
> > Checking if processes are up.
> > Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.
> >
> > The SQ environment is up!
> >
> >
> > Process         Configured      Actual      Down
> > -------         ----------      ------      ----
> > DTM             5               5
> > RMS             10              10
> > MXOSRVR         20              20
> >
> > On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias <[email protected]>
> wrote:
> >
> > > I've restarted hdp and trafodion and now I managed to create the
> > > schema and stored procedures from hammerdb. But I'm getting fails and
> > > dump core again by trafodion while running virtual users. For some of
> > > the users I sometimes see in hammerdb logs:
> > > Vuser 5:Failed to execute payment
> > > Vuser 5:Failed to execute stock level
> > > Vuser 5:Failed to execute new order
> > >
> > > Core files are on out last node, feel free to examine them, the files
> > > were dumped while getting hammerdb errors:
> > >
> > > *core.49256*
> > >
> > > *core.48633*
> > >
> > > *core.49290*
> > >
> > >
> > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias <[email protected]>
> > wrote:
> > >
> > >> *Scenario 1:*
> > >>
> > >> I've created this issue
> > >> https://issues.apache.org/jira/browse/TRAFODION-1492
> > >> I think another fix was made related to *Committed_AS* in
> > >> *sql/cli/memmonitor.cpp*.
> > >>
> > >> This is a response from Narendra in a previous thread where the issue
> > >> was fixed to start the trafodion:
> > >>
> > >>
> > >>>
> > >>>
> > >>>
> > >>> *I updated the code: sql/cli/memmonitor.cpp, so that if
> > >>> /proc/meminfo does not have the ‘Committed_AS’ entry, it will ignore
> > >>> it. Built it and put the binary: libcli.so on the veracity box (in
> > >>> the $MY_SQROOT/export/lib64 directory – on all the nodes). Restarted
> > the
> > >>> env and ‘sqlci’ worked fine.
> > >>> Was able to ‘initialize trafodion’ and create a table.*
> > >>
> > >>
> > >> *Scenario 2:*
> > >>
> > >> The *java -version* problem I recall we had only on the other cluster
> > >> with centos 7, I did't seen it on this one with centos 6.7. But a
> > >> change I made these days in the latter one is installing oracle *jdk
> > >> 1.7.0_79* as default one and is where *JAVA_HOME* points to. Before
> > >> that some nodes had *open-jdk* as default and others didn't have one
> > >> but just the one installed by path by *ambari* in
> > >> */usr/jdk64/jdk1.7.0_67* but which was not linked to JAVA_HOME or
> *java*
> > >> command by *alternatives*.
> > >>
> > >> *Failures is HammerDB:*
> > >>
> > >> Attached is the *trafodion.dtm.**log* from a node on which I see a
> > >> lot of lines like these and I assume is the *transaction conflict*
> > >> that you mentioned, I see these line on 4 out of 5 nodes:
> > >>
> > >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient: useForgotten is true
> > >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient: forceForgotten is
> > >> false
> > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: forceControlPoint is
> > >> false
> > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: useAutoFlush is false
> > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: ageCommitted is false
> > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: disableBlockCache is
> > >> false
> > >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint:
> > >> disableBlockCache is false
> > >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint: useAutoFlush
> > >> is false
> > >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> > >> prepareCommit, txid: 17179989222
> > >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> > >> prepareCommit, txid: 17179989277
> > >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> > >> prepareCommit, txid: 17179989309
> > >>
> > >> What *transaction conflict* means in this case?
> > >>
> > >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan <
> > >> [email protected]> wrote:
> > >>
> > >>> Hi Radu,
> > >>>
> > >>> Thanks for using Trafodion. With the help from Suresh, we looked at
> > >>> the core files in your cluster. We believe that there are two
> > >>> scenarios that is causing the Trafodion processes to dump core.
> > >>>
> > >>> Scenario 1:
> > >>> Core dumped by tdm_arkesp processes. Trafodion engine has assumed
> > >>> the entity /proc/meminfo/Committed_AS is available in all flavors of
> > >>> linux.  The absence of this entity is not handled correctly by the
> > >>> trafodion tdm_arkesp process and hence it dumped core. Please file a
> > >>> JIRA using this link
> > >>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa and
> > >>> choose "Apache Trafodion" as the project to report a bug against.
> > >>>
> > >>> Scenario 2:
> > >>> Core dumped by tdm_udrserv processes. From our analysis, this
> > >>> problem happened when the process attempted to create the JVM
> > >>> instance programmatically. Few days earlier, we have observed
> > >>> similar issue in your cluster when java -version command was
> > >>> attempted. But, java -version or $JAVA_HOME/bin/java -version works
> > >>> fine now.
> > >>> Was there any change made to the cluster recently to avoid the
> > >>> problem with java -version command?
> > >>>
> > >>> You can please delete all the core files in sql/scripts directory
> > >>> and issue the command to invoke SPJ and check if it still dumps
> > >>> core. We can look at the core file if it happens again. Your
> > >>> solution to the java -version command would be helpful.
> > >>>
> > >>> For the failures with HammerDB, can you please send us the exact
> > >>> error message returned by the Trafodion engine to the application.
> > >>> This might help us to narrow down the cause. You can also look at
> > >>> $MY_SQROOT/logs/trafodion.dtm.log to check if any transaction
> > >>> conflict is causing this error.
> > >>>
> > >>> Selva
> > >>> -----Original Message-----
> > >>> From: Radu Marias [mailto:[email protected]]
> > >>> Sent: Tuesday, September 15, 2015 9:09 AM
> > >>> To: dev <[email protected]>
> > >>> Subject: Re: odbc and/or hammerdb logs
> > >>>
> > >>> Also noticed there are several core. files from today in
> > >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*. If needed
> > >>> please provide a gmail address so I can share them via gdrive.
> > >>>
> > >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias <[email protected]>
> > >>> wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> > I'm running HammerDB over trafodion and when running virtual users
> > >>> > sometimes I get errors like this in hammerdb logs:
> > >>> > *Vuser 1:Failed to execute payment*
> > >>> >
> > >>> > *Vuser 1:Failed to execute new order*
> > >>> >
> > >>> > I'm using unixODBC and I tried to add these line in
> > >>> > */etc/odbc.ini* but the trace file is not created.
> > >>> > *[ODBC]*
> > >>> > *Trace = 1*
> > >>> > *TraceFile = /var/log/odbc_tracefile.log*
> > >>> >
> > >>> > Also tried with *Trace = yes* and *Trace = on*, I've found
> > >>> > multiple references for both.
> > >>> >
> > >>> > How can I see more logs to debug the issue? Can I enable logs for
> > >>> > all queries in trafodion?
> > >>> >
> > >>> > --
> > >>> > And in the end, it's not the years in your life that count. It's
> > >>> > the life in your years.
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> And in the end, it's not the years in your life that count. It's the
> > >>> life in your years.
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> And in the end, it's not the years in your life that count. It's the
> > life
> > >> in your years.
> > >>
> > >
> > >
> > >
> > > --
> > > And in the end, it's not the years in your life that count. It's the
> life
> > > in your years.
> > >
> >
> >
> >
> > --
> > And in the end, it's not the years in your life that count. It's the life
> > in your years.
> >
>



-- 
And in the end, it's not the years in your life that count. It's the life
in your years.

Reply via email to