I've logged this issue several days ago, is this ok?
https://issues.apache.org/jira/browse/TRAFODION-1492

Will try with one user and let you know.

On Fri, Sep 18, 2015 at 7:05 AM, Suresh Subbiah <[email protected]>
wrote:

> Hi
>
> How many Virtual users are being used? If it is more than one could we
> please try the case with 1 user first.
>
> When the crash happens next time could we please try
> sqps | grep esp | wc -l
>
> If this number is large we know a lot of esp processes are being started
> which could consume memory.
> If this is the case please insert this row into the defaults table from
> sqlci and the restart dcs (dcsstop followed by dcsstart)
> insert into "_MD_".defaults values('ATTEMPT_ESP_PARALLELISM', 'OFF',
> 'hammerdb testing') ;
> exit ;
>
> I will work having the udr process create a JVM with a smaller initial heap
> size. If you have time and would like to do so a, JIRA you file will be
> helpful. Or I can file the JIRA and work on it. It will not take long to
> make this change.
>
> Thanks
> Suresh
>
> PS I found this command from stackOverflow to determine the initialHeapSize
> we get by default in this env
>
> java -XX:+PrintFlagsFinal -version | grep HeapSize
>
>
>
>
> http://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined
>
>
>
> On Thu, Sep 17, 2015 at 10:32 AM, Radu Marias <[email protected]>
> wrote:
>
> > Did the steps mentioned above to ensure that the trafodion processes are
> > free of JAVA installation mixup.
> > Also changed so that hdp, trafodion and hammerdb uses the same jdk from
> > */usr/jdk64/jdk1.7.0_67*
> >
> > # java -version
> > java version "1.7.0_67"
> > Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
> > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
> >
> > # echo $JAVA_HOME
> > /usr/jdk64/jdk1.7.0_67
> >
> > But when running hammerdb I got again a crash on 2 nodes. I noticed that
> > before the crash for about one minute I'm getting errors for *java
> > -version* and
> > about 30 seconds after the crash the java -version worked again. So these
> > issues might be related. Didn't yet found the problem and how to fix the
> > java -version issue.
> >
> > # java -version
> > Error occurred during initialization of VM
> > Could not reserve enough space for object heap
> > Error: Could not create the Java Virtual Machine.
> > Error: A fatal exception has occurred. Program will exit
> >
> > # file core.5813
> > core.5813: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
> > from 'tdm_udrserv SQMON1.1 00000 00000 005813 $Z0004R3
> > 188.138.61.175:48357
> > 00004 000'
> >
> > #0  0x00007f6920ba0625 in raise () from /lib64/libc.so.6
> > #1  0x00007f6920ba1e05 in abort () from /lib64/libc.so.6
> > #2  0x0000000000424369 in comTFDS (msg1=0x43c070 "Trafodion UDR Server
> > Internal Error", msg2=<value optimized out>, msg3=0x7fff119787f0 "Source
> > file information unavailable",
> >     msg4=0x7fff11977ff0 "User routine being processed :
> > TRAFODION.TPCC.NEWORDER, Routine Type : Stored Procedure, Language Type :
> > JAVA, Error occurred outside the user routine code", msg5=0x43ddc3 "",
> > dialOut=<value optimized out>, writeToSeaLog=1) at
> > ../udrserv/UdrFFDC.cpp:191
> > #3  0x00000000004245d7 in makeTFDSCall (msg=0x7f692324b310 "The Java
> > virtual machine aborted", file=<value optimized out>, line=<value
> optimized
> > out>, dialOut=1, writeToSeaLog=1) at ../udrserv/UdrFFDC.cpp:219
> > #4  0x00007f69232316b8 in LmJavaHooks::abortHookJVM () at
> > ../langman/LmJavaHooks.cpp:54
> > #5  0x00007f69229cbbc6 in ParallelScavengeHeap::initialize() () from
> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> > #6  0x00007f6922afedba in Universe::initialize_heap() () from
> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> > #7  0x00007f6922afff89 in universe_init() () from
> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> > #8  0x00007f692273d9f5 in init_globals() () from
> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> > #9  0x00007f6922ae78ed in Threads::create_vm(JavaVMInitArgs*, bool*) ()
> > from /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> > #10 0x00007f69227c5a34 in JNI_CreateJavaVM () from
> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> > #11 0x00007f692322de51 in LmLanguageManagerJava::initialize (this=<value
> > optimized out>, result=<value optimized out>, maxLMJava=<value optimized
> > out>, userOptions=0x7f69239ba418, diagsArea=<value optimized out>) at
> > ../langman/LmLangManagerJava.cpp:379
> > #12 0x00007f692322f564 in LmLanguageManagerJava::LmLanguageManagerJava
> > (this=0x7f69239bec38, result=@0x7fff1197e19c, commandLineMode=<value
> > optimized out>, maxLMJava=1, userOptions=0x7f69239ba418,
> > diagsArea=0x7f6923991780) at ../langman/LmLangManagerJava.cpp:155
> > #13 0x0000000000425619 in UdrGlobals::getOrCreateJavaLM
> > (this=0x7f69239ba040, result=@0x7fff1197e19c, diags=<value optimized
> out>)
> > at ../udrserv/udrglobals.cpp:322
> > #14 0x0000000000427328 in processALoadMessage (UdrGlob=0x7f69239ba040,
> > msgStream=..., request=..., env=<value optimized out>) at
> > ../udrserv/udrload.cpp:163
> > #15 0x000000000042fbfd in processARequest (UdrGlob=0x7f69239ba040,
> > msgStream=..., env=...) at ../udrserv/udrserv.cpp:660
> > #16 0x000000000043269c in runServer (argc=2, argv=0x7fff1197e528) at
> > ../udrserv/udrserv.cpp:520
> > #17 0x000000000043294e in main (argc=2, argv=0x7fff1197e528) at
> > ../udrserv/udrserv.cpp:356
> >
> > On Wed, Sep 16, 2015 at 6:03 PM, Suresh Subbiah <
> > [email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > I have added a wiki page that describes how to get a stack trace from a
> > > core file. The page could do with some improvements on finding the core
> > > file and maybe even doing more than getting thestack trace. For now it
> > > should make our troubleshooting cycle faster if the stack trace is
> > included
> > > in the initial message itself.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+trace+from+a+core+file
> > >
> > > In this case, the last node does not seem to have gdb, so I could not
> see
> > > the trace there. I moved the core file to the first node but then the
> > trace
> > > looks like this. I assume this is because I moved the core file to a
> > > different node. I think Selva's suggestion is good to try. We may have
> > had
> > > a few  tdm_udrserv processes from before the time the java change was
> > made.
> > >
> > > $ gdb tdm_udrserv core.49256
> > > #0  0x00007fe187a674fe in __longjmp () from /lib64/libc.so.6
> > > #1  0x8857780a58ff2155 in ?? ()
> > > Cannot access memory at address 0x8857780a58ff2155
> > >
> > > The back trace we saw yesterday when a udrserv process exited when JVM
> > > could not be started is used in the wiki page instead of this one. If
> you
> > > have time a JIRA on this unexpected udrserv exit will also be valuable
> > for
> > > the Trafodion team.
> > >
> > > Thanks
> > > Suresh
> > >
> > > On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan <
> > > [email protected]> wrote:
> > >
> > > > Thanks for creating the JIRA Trafodion-1492.  The error is similar to
> > > > scenario-2. The process tdm_udrserv dumped core. We will look into
> the
> > > core
> > > > file. In the meantime, can you please do the following:
> > > >
> > > > Bring the Trafodion instance down
> > > > echo $MY_SQROOT -- shows Trafodion installation directory
> > > > Remove $MY_SQROOT/etc/ms.env from all nodes
> > > >
> > > >
> > > > Start a New Terminal Session so that new Java settings are in place
> > > > Login as a Trafodion user
> > > > cd <trafodion_installation_directory>
> > > > . ./sqenv.sh  (skip this if it is done automatically upon logon)
> > > > sqgen
> > > >
> > > > Exit and Start a New Terminal Session
> > > > Restart the Trafodion instance and check if you are seeing the issue
> > with
> > > > tdm_udrserv again. We wanted to ensure that the trafodion processes
> are
> > > > free
> > > > of JAVA installation mixup in your earlier message. We suspect that
> can
> > > > cause tdm_udrserv process  to dump core.
> > > >
> > > >
> > > > Selva
> > > >
> > > > -----Original Message-----
> > > > From: Radu Marias [mailto:[email protected]]
> > > > Sent: Wednesday, September 16, 2015 5:40 AM
> > > > To: dev <[email protected]>
> > > > Subject: Re: odbc and/or hammerdb logs
> > > >
> > > > I'm seeing this in hammerdb logs, I assume is due to the crash and
> some
> > > > processes are stopped:
> > > >
> > > > Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion Database]
> > SQL
> > > > ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error 201 while
> > > > communicating with server process $Z010LPE:23. [2015-09-16 12:35:33]
> > > > [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:*** ERROR[8904]
> > SQL
> > > > did not receive a reply from MXUDR, possibly caused by internal
> errors
> > > when
> > > > executing user-defined routines. [2015-09-16 12:35:33]
> > > >
> > > > $ sqcheck
> > > > Checking if processes are up.
> > > > Checking attempt: 1; user specified max: 2. Execution time in
> seconds:
> > 0.
> > > >
> > > > The SQ environment is up!
> > > >
> > > >
> > > > Process         Configured      Actual      Down
> > > > -------         ----------      ------      ----
> > > > DTM             5               5
> > > > RMS             10              10
> > > > MXOSRVR         20              20
> > > >
> > > > On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias <[email protected]>
> > > wrote:
> > > >
> > > > > I've restarted hdp and trafodion and now I managed to create the
> > > > > schema and stored procedures from hammerdb. But I'm getting fails
> and
> > > > > dump core again by trafodion while running virtual users. For some
> of
> > > > > the users I sometimes see in hammerdb logs:
> > > > > Vuser 5:Failed to execute payment
> > > > > Vuser 5:Failed to execute stock level
> > > > > Vuser 5:Failed to execute new order
> > > > >
> > > > > Core files are on out last node, feel free to examine them, the
> files
> > > > > were dumped while getting hammerdb errors:
> > > > >
> > > > > *core.49256*
> > > > >
> > > > > *core.48633*
> > > > >
> > > > > *core.49290*
> > > > >
> > > > >
> > > > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias <[email protected]
> >
> > > > wrote:
> > > > >
> > > > >> *Scenario 1:*
> > > > >>
> > > > >> I've created this issue
> > > > >> https://issues.apache.org/jira/browse/TRAFODION-1492
> > > > >> I think another fix was made related to *Committed_AS* in
> > > > >> *sql/cli/memmonitor.cpp*.
> > > > >>
> > > > >> This is a response from Narendra in a previous thread where the
> > issue
> > > > >> was fixed to start the trafodion:
> > > > >>
> > > > >>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> *I updated the code: sql/cli/memmonitor.cpp, so that if
> > > > >>> /proc/meminfo does not have the ‘Committed_AS’ entry, it will
> > ignore
> > > > >>> it. Built it and put the binary: libcli.so on the veracity box
> (in
> > > > >>> the $MY_SQROOT/export/lib64 directory – on all the nodes).
> > Restarted
> > > > the
> > > > >>> env and ‘sqlci’ worked fine.
> > > > >>> Was able to ‘initialize trafodion’ and create a table.*
> > > > >>
> > > > >>
> > > > >> *Scenario 2:*
> > > > >>
> > > > >> The *java -version* problem I recall we had only on the other
> > cluster
> > > > >> with centos 7, I did't seen it on this one with centos 6.7. But a
> > > > >> change I made these days in the latter one is installing oracle
> *jdk
> > > > >> 1.7.0_79* as default one and is where *JAVA_HOME* points to.
> Before
> > > > >> that some nodes had *open-jdk* as default and others didn't have
> one
> > > > >> but just the one installed by path by *ambari* in
> > > > >> */usr/jdk64/jdk1.7.0_67* but which was not linked to JAVA_HOME or
> > > *java*
> > > > >> command by *alternatives*.
> > > > >>
> > > > >> *Failures is HammerDB:*
> > > > >>
> > > > >> Attached is the *trafodion.dtm.**log* from a node on which I see a
> > > > >> lot of lines like these and I assume is the *transaction conflict*
> > > > >> that you mentioned, I see these line on 4 out of 5 nodes:
> > > > >>
> > > > >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient: useForgotten is
> true
> > > > >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient: forceForgotten is
> > > > >> false
> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: forceControlPoint is
> > > > >> false
> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: useAutoFlush is
> false
> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: ageCommitted is
> false
> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: disableBlockCache is
> > > > >> false
> > > > >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint:
> > > > >> disableBlockCache is false
> > > > >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint:
> > useAutoFlush
> > > > >> is false
> > > > >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: Exit
> RET_HASCONFLICT
> > > > >> prepareCommit, txid: 17179989222
> > > > >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: Exit
> RET_HASCONFLICT
> > > > >> prepareCommit, txid: 17179989277
> > > > >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: Exit
> RET_HASCONFLICT
> > > > >> prepareCommit, txid: 17179989309
> > > > >>
> > > > >> What *transaction conflict* means in this case?
> > > > >>
> > > > >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan <
> > > > >> [email protected]> wrote:
> > > > >>
> > > > >>> Hi Radu,
> > > > >>>
> > > > >>> Thanks for using Trafodion. With the help from Suresh, we looked
> at
> > > > >>> the core files in your cluster. We believe that there are two
> > > > >>> scenarios that is causing the Trafodion processes to dump core.
> > > > >>>
> > > > >>> Scenario 1:
> > > > >>> Core dumped by tdm_arkesp processes. Trafodion engine has assumed
> > > > >>> the entity /proc/meminfo/Committed_AS is available in all flavors
> > of
> > > > >>> linux.  The absence of this entity is not handled correctly by
> the
> > > > >>> trafodion tdm_arkesp process and hence it dumped core. Please
> file
> > a
> > > > >>> JIRA using this link
> > > > >>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
> and
> > > > >>> choose "Apache Trafodion" as the project to report a bug against.
> > > > >>>
> > > > >>> Scenario 2:
> > > > >>> Core dumped by tdm_udrserv processes. From our analysis, this
> > > > >>> problem happened when the process attempted to create the JVM
> > > > >>> instance programmatically. Few days earlier, we have observed
> > > > >>> similar issue in your cluster when java -version command was
> > > > >>> attempted. But, java -version or $JAVA_HOME/bin/java -version
> works
> > > > >>> fine now.
> > > > >>> Was there any change made to the cluster recently to avoid the
> > > > >>> problem with java -version command?
> > > > >>>
> > > > >>> You can please delete all the core files in sql/scripts directory
> > > > >>> and issue the command to invoke SPJ and check if it still dumps
> > > > >>> core. We can look at the core file if it happens again. Your
> > > > >>> solution to the java -version command would be helpful.
> > > > >>>
> > > > >>> For the failures with HammerDB, can you please send us the exact
> > > > >>> error message returned by the Trafodion engine to the
> application.
> > > > >>> This might help us to narrow down the cause. You can also look at
> > > > >>> $MY_SQROOT/logs/trafodion.dtm.log to check if any transaction
> > > > >>> conflict is causing this error.
> > > > >>>
> > > > >>> Selva
> > > > >>> -----Original Message-----
> > > > >>> From: Radu Marias [mailto:[email protected]]
> > > > >>> Sent: Tuesday, September 15, 2015 9:09 AM
> > > > >>> To: dev <[email protected]>
> > > > >>> Subject: Re: odbc and/or hammerdb logs
> > > > >>>
> > > > >>> Also noticed there are several core. files from today in
> > > > >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*. If needed
> > > > >>> please provide a gmail address so I can share them via gdrive.
> > > > >>>
> > > > >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias <
> [email protected]
> > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hi,
> > > > >>> >
> > > > >>> > I'm running HammerDB over trafodion and when running virtual
> > users
> > > > >>> > sometimes I get errors like this in hammerdb logs:
> > > > >>> > *Vuser 1:Failed to execute payment*
> > > > >>> >
> > > > >>> > *Vuser 1:Failed to execute new order*
> > > > >>> >
> > > > >>> > I'm using unixODBC and I tried to add these line in
> > > > >>> > */etc/odbc.ini* but the trace file is not created.
> > > > >>> > *[ODBC]*
> > > > >>> > *Trace = 1*
> > > > >>> > *TraceFile = /var/log/odbc_tracefile.log*
> > > > >>> >
> > > > >>> > Also tried with *Trace = yes* and *Trace = on*, I've found
> > > > >>> > multiple references for both.
> > > > >>> >
> > > > >>> > How can I see more logs to debug the issue? Can I enable logs
> for
> > > > >>> > all queries in trafodion?
> > > > >>> >
> > > > >>> > --
> > > > >>> > And in the end, it's not the years in your life that count.
> It's
> > > > >>> > the life in your years.
> > > > >>> >
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> And in the end, it's not the years in your life that count. It's
> > the
> > > > >>> life in your years.
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> And in the end, it's not the years in your life that count. It's
> the
> > > > life
> > > > >> in your years.
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > And in the end, it's not the years in your life that count. It's
> the
> > > life
> > > > > in your years.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > And in the end, it's not the years in your life that count. It's the
> > life
> > > > in your years.
> > > >
> > >
> >
> >
> >
> > --
> > And in the end, it's not the years in your life that count. It's the life
> > in your years.
> >
>



-- 
And in the end, it's not the years in your life that count. It's the life
in your years.

Reply via email to