Re: odbc and/or hammerdb logs

Radu Marias Thu, 17 Sep 2015 08:33:23 -0700

Did the steps mentioned above to ensure that the trafodion processes are
free of JAVA installation mixup.
Also changed so that hdp, trafodion and hammerdb uses the same jdk from
*/usr/jdk64/jdk1.7.0_67*


# java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

# echo $JAVA_HOME
/usr/jdk64/jdk1.7.0_67

But when running hammerdb I got again a crash on 2 nodes. I noticed that
before the crash for about one minute I'm getting errors for *java
-version* and
about 30 seconds after the crash the java -version worked again. So these
issues might be related. Didn't yet found the problem and how to fix the
java -version issue.

# java -version
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

# file core.5813
core.5813: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'tdm_udrserv SQMON1.1 00000 00000 005813 $Z0004R3 188.138.61.175:48357
00004 000'

#0  0x00007f6920ba0625 in raise () from /lib64/libc.so.6
#1  0x00007f6920ba1e05 in abort () from /lib64/libc.so.6
#2  0x0000000000424369 in comTFDS (msg1=0x43c070 "Trafodion UDR Server
Internal Error", msg2=<value optimized out>, msg3=0x7fff119787f0 "Source
file information unavailable",
    msg4=0x7fff11977ff0 "User routine being processed :
TRAFODION.TPCC.NEWORDER, Routine Type : Stored Procedure, Language Type :
JAVA, Error occurred outside the user routine code", msg5=0x43ddc3 "",
dialOut=<value optimized out>, writeToSeaLog=1) at
../udrserv/UdrFFDC.cpp:191
#3  0x00000000004245d7 in makeTFDSCall (msg=0x7f692324b310 "The Java
virtual machine aborted", file=<value optimized out>, line=<value optimized
out>, dialOut=1, writeToSeaLog=1) at ../udrserv/UdrFFDC.cpp:219
#4  0x00007f69232316b8 in LmJavaHooks::abortHookJVM () at
../langman/LmJavaHooks.cpp:54
#5  0x00007f69229cbbc6 in ParallelScavengeHeap::initialize() () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#6  0x00007f6922afedba in Universe::initialize_heap() () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#7  0x00007f6922afff89 in universe_init() () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#8  0x00007f692273d9f5 in init_globals() () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#9  0x00007f6922ae78ed in Threads::create_vm(JavaVMInitArgs*, bool*) ()
from /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#10 0x00007f69227c5a34 in JNI_CreateJavaVM () from
/usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
#11 0x00007f692322de51 in LmLanguageManagerJava::initialize (this=<value
optimized out>, result=<value optimized out>, maxLMJava=<value optimized
out>, userOptions=0x7f69239ba418, diagsArea=<value optimized out>) at
../langman/LmLangManagerJava.cpp:379
#12 0x00007f692322f564 in LmLanguageManagerJava::LmLanguageManagerJava
(this=0x7f69239bec38, result=@0x7fff1197e19c, commandLineMode=<value
optimized out>, maxLMJava=1, userOptions=0x7f69239ba418,
diagsArea=0x7f6923991780) at ../langman/LmLangManagerJava.cpp:155
#13 0x0000000000425619 in UdrGlobals::getOrCreateJavaLM
(this=0x7f69239ba040, result=@0x7fff1197e19c, diags=<value optimized out>)
at ../udrserv/udrglobals.cpp:322
#14 0x0000000000427328 in processALoadMessage (UdrGlob=0x7f69239ba040,
msgStream=..., request=..., env=<value optimized out>) at
../udrserv/udrload.cpp:163
#15 0x000000000042fbfd in processARequest (UdrGlob=0x7f69239ba040,
msgStream=..., env=...) at ../udrserv/udrserv.cpp:660
#16 0x000000000043269c in runServer (argc=2, argv=0x7fff1197e528) at
../udrserv/udrserv.cpp:520
#17 0x000000000043294e in main (argc=2, argv=0x7fff1197e528) at
../udrserv/udrserv.cpp:356

On Wed, Sep 16, 2015 at 6:03 PM, Suresh Subbiah <[email protected]>
wrote:

> Hi,
>
> I have added a wiki page that describes how to get a stack trace from a
> core file. The page could do with some improvements on finding the core
> file and maybe even doing more than getting thestack trace. For now it
> should make our troubleshooting cycle faster if the stack trace is included
> in the initial message itself.
>
> https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+trace+from+a+core+file
>
> In this case, the last node does not seem to have gdb, so I could not see
> the trace there. I moved the core file to the first node but then the trace
> looks like this. I assume this is because I moved the core file to a
> different node. I think Selva's suggestion is good to try. We may have had
> a few  tdm_udrserv processes from before the time the java change was made.
>
> $ gdb tdm_udrserv core.49256
> #0  0x00007fe187a674fe in __longjmp () from /lib64/libc.so.6
> #1  0x8857780a58ff2155 in ?? ()
> Cannot access memory at address 0x8857780a58ff2155
>
> The back trace we saw yesterday when a udrserv process exited when JVM
> could not be started is used in the wiki page instead of this one. If you
> have time a JIRA on this unexpected udrserv exit will also be valuable for
> the Trafodion team.
>
> Thanks
> Suresh
>
> On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan <
> [email protected]> wrote:
>
> > Thanks for creating the JIRA Trafodion-1492.  The error is similar to
> > scenario-2. The process tdm_udrserv dumped core. We will look into the
> core
> > file. In the meantime, can you please do the following:
> >
> > Bring the Trafodion instance down
> > echo $MY_SQROOT -- shows Trafodion installation directory
> > Remove $MY_SQROOT/etc/ms.env from all nodes
> >
> >
> > Start a New Terminal Session so that new Java settings are in place
> > Login as a Trafodion user
> > cd <trafodion_installation_directory>
> > . ./sqenv.sh  (skip this if it is done automatically upon logon)
> > sqgen
> >
> > Exit and Start a New Terminal Session
> > Restart the Trafodion instance and check if you are seeing the issue with
> > tdm_udrserv again. We wanted to ensure that the trafodion processes are
> > free
> > of JAVA installation mixup in your earlier message. We suspect that can
> > cause tdm_udrserv process  to dump core.
> >
> >
> > Selva
> >
> > -----Original Message-----
> > From: Radu Marias [mailto:[email protected]]
> > Sent: Wednesday, September 16, 2015 5:40 AM
> > To: dev <[email protected]>
> > Subject: Re: odbc and/or hammerdb logs
> >
> > I'm seeing this in hammerdb logs, I assume is due to the crash and some
> > processes are stopped:
> >
> > Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion Database] SQL
> > ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error 201 while
> > communicating with server process $Z010LPE:23. [2015-09-16 12:35:33]
> > [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:*** ERROR[8904] SQL
> > did not receive a reply from MXUDR, possibly caused by internal errors
> when
> > executing user-defined routines. [2015-09-16 12:35:33]
> >
> > $ sqcheck
> > Checking if processes are up.
> > Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.
> >
> > The SQ environment is up!
> >
> >
> > Process         Configured      Actual      Down
> > -------         ----------      ------      ----
> > DTM             5               5
> > RMS             10              10
> > MXOSRVR         20              20
> >
> > On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias <[email protected]>
> wrote:
> >
> > > I've restarted hdp and trafodion and now I managed to create the
> > > schema and stored procedures from hammerdb. But I'm getting fails and
> > > dump core again by trafodion while running virtual users. For some of
> > > the users I sometimes see in hammerdb logs:
> > > Vuser 5:Failed to execute payment
> > > Vuser 5:Failed to execute stock level
> > > Vuser 5:Failed to execute new order
> > >
> > > Core files are on out last node, feel free to examine them, the files
> > > were dumped while getting hammerdb errors:
> > >
> > > *core.49256*
> > >
> > > *core.48633*
> > >
> > > *core.49290*
> > >
> > >
> > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias <[email protected]>
> > wrote:
> > >
> > >> *Scenario 1:*
> > >>
> > >> I've created this issue
> > >> https://issues.apache.org/jira/browse/TRAFODION-1492
> > >> I think another fix was made related to *Committed_AS* in
> > >> *sql/cli/memmonitor.cpp*.
> > >>
> > >> This is a response from Narendra in a previous thread where the issue
> > >> was fixed to start the trafodion:
> > >>
> > >>
> > >>>
> > >>>
> > >>>
> > >>> *I updated the code: sql/cli/memmonitor.cpp, so that if
> > >>> /proc/meminfo does not have the ‘Committed_AS’ entry, it will ignore
> > >>> it. Built it and put the binary: libcli.so on the veracity box (in
> > >>> the $MY_SQROOT/export/lib64 directory – on all the nodes). Restarted
> > the
> > >>> env and ‘sqlci’ worked fine.
> > >>> Was able to ‘initialize trafodion’ and create a table.*
> > >>
> > >>
> > >> *Scenario 2:*
> > >>
> > >> The *java -version* problem I recall we had only on the other cluster
> > >> with centos 7, I did't seen it on this one with centos 6.7. But a
> > >> change I made these days in the latter one is installing oracle *jdk
> > >> 1.7.0_79* as default one and is where *JAVA_HOME* points to. Before
> > >> that some nodes had *open-jdk* as default and others didn't have one
> > >> but just the one installed by path by *ambari* in
> > >> */usr/jdk64/jdk1.7.0_67* but which was not linked to JAVA_HOME or
> *java*
> > >> command by *alternatives*.
> > >>
> > >> *Failures is HammerDB:*
> > >>
> > >> Attached is the *trafodion.dtm.**log* from a node on which I see a
> > >> lot of lines like these and I assume is the *transaction conflict*
> > >> that you mentioned, I see these line on 4 out of 5 nodes:
> > >>
> > >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient: useForgotten is true
> > >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient: forceForgotten is
> > >> false
> > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: forceControlPoint is
> > >> false
> > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: useAutoFlush is false
> > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: ageCommitted is false
> > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: disableBlockCache is
> > >> false
> > >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint:
> > >> disableBlockCache is false
> > >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint: useAutoFlush
> > >> is false
> > >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> > >> prepareCommit, txid: 17179989222
> > >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> > >> prepareCommit, txid: 17179989277
> > >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> > >> prepareCommit, txid: 17179989309
> > >>
> > >> What *transaction conflict* means in this case?
> > >>
> > >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan <
> > >> [email protected]> wrote:
> > >>
> > >>> Hi Radu,
> > >>>
> > >>> Thanks for using Trafodion. With the help from Suresh, we looked at
> > >>> the core files in your cluster. We believe that there are two
> > >>> scenarios that is causing the Trafodion processes to dump core.
> > >>>
> > >>> Scenario 1:
> > >>> Core dumped by tdm_arkesp processes. Trafodion engine has assumed
> > >>> the entity /proc/meminfo/Committed_AS is available in all flavors of
> > >>> linux.  The absence of this entity is not handled correctly by the
> > >>> trafodion tdm_arkesp process and hence it dumped core. Please file a
> > >>> JIRA using this link
> > >>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa and
> > >>> choose "Apache Trafodion" as the project to report a bug against.
> > >>>
> > >>> Scenario 2:
> > >>> Core dumped by tdm_udrserv processes. From our analysis, this
> > >>> problem happened when the process attempted to create the JVM
> > >>> instance programmatically. Few days earlier, we have observed
> > >>> similar issue in your cluster when java -version command was
> > >>> attempted. But, java -version or $JAVA_HOME/bin/java -version works
> > >>> fine now.
> > >>> Was there any change made to the cluster recently to avoid the
> > >>> problem with java -version command?
> > >>>
> > >>> You can please delete all the core files in sql/scripts directory
> > >>> and issue the command to invoke SPJ and check if it still dumps
> > >>> core. We can look at the core file if it happens again. Your
> > >>> solution to the java -version command would be helpful.
> > >>>
> > >>> For the failures with HammerDB, can you please send us the exact
> > >>> error message returned by the Trafodion engine to the application.
> > >>> This might help us to narrow down the cause. You can also look at
> > >>> $MY_SQROOT/logs/trafodion.dtm.log to check if any transaction
> > >>> conflict is causing this error.
> > >>>
> > >>> Selva
> > >>> -----Original Message-----
> > >>> From: Radu Marias [mailto:[email protected]]
> > >>> Sent: Tuesday, September 15, 2015 9:09 AM
> > >>> To: dev <[email protected]>
> > >>> Subject: Re: odbc and/or hammerdb logs
> > >>>
> > >>> Also noticed there are several core. files from today in
> > >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*. If needed
> > >>> please provide a gmail address so I can share them via gdrive.
> > >>>
> > >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias <[email protected]>
> > >>> wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> > I'm running HammerDB over trafodion and when running virtual users
> > >>> > sometimes I get errors like this in hammerdb logs:
> > >>> > *Vuser 1:Failed to execute payment*
> > >>> >
> > >>> > *Vuser 1:Failed to execute new order*
> > >>> >
> > >>> > I'm using unixODBC and I tried to add these line in
> > >>> > */etc/odbc.ini* but the trace file is not created.
> > >>> > *[ODBC]*
> > >>> > *Trace = 1*
> > >>> > *TraceFile = /var/log/odbc_tracefile.log*
> > >>> >
> > >>> > Also tried with *Trace = yes* and *Trace = on*, I've found
> > >>> > multiple references for both.
> > >>> >
> > >>> > How can I see more logs to debug the issue? Can I enable logs for
> > >>> > all queries in trafodion?
> > >>> >
> > >>> > --
> > >>> > And in the end, it's not the years in your life that count. It's
> > >>> > the life in your years.
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> And in the end, it's not the years in your life that count. It's the
> > >>> life in your years.
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> And in the end, it's not the years in your life that count. It's the
> > life
> > >> in your years.
> > >>
> > >
> > >
> > >
> > > --
> > > And in the end, it's not the years in your life that count. It's the
> life
> > > in your years.
> > >
> >
> >
> >
> > --
> > And in the end, it's not the years in your life that count. It's the life
> > in your years.
> >
>



-- 
And in the end, it's not the years in your life that count. It's the life
in your years.

Re: odbc and/or hammerdb logs

Reply via email to