Hi, I added this in *$MY_SQROOT/etc/ms.env* *JAVA_TOOL_OPTIONS=-Xmx512m*
and this in *~trafodion/.bashrc* *alias java="java -Xmx512m"* The alias also handles the JAVA_TOOL_OPTIONS from above but in this way trafodion processes and also the creation of SPJ from hammerdb are working with 512m limit. Managed to create the schema and run with 5 users in hammerdb for about 30 minutes and no crash in trafodion. On Sat, Sep 19, 2015 at 12:36 AM, Selva Govindarajan < [email protected]> wrote: > Hi Radu, > > In future Trafodion will ensure that the tdm_udrserv will be created with > the proper amount of Java heap size. As a workaround you can set this > variable > JAVA_TOOL_OPTIONS=-Xmx512m in $MY_SQROOT/etc/ms.env > > Copy this ms.env to all nodes. Restart DCS environment. > > The tdm_udrserv will be created with 512mb of maximum Java heap size. That > should be sufficient for most of the workloads. > > We will fix these issues in incubator-trafodion project soon. > > You might consider reducing the max heap size for other java programs too > depending upon the type of work being done by your java programs. I feel > 2GB > might be a large value for client java programs. > > Selva > > -----Original Message----- > From: Radu Marias [mailto:[email protected]] > Sent: Friday, September 18, 2015 1:57 PM > To: [email protected] > Subject: Re: odbc and/or hammerdb logs > > I added it to /etc/bashrc and yes all users are seeing it. At least this > script is executed in ~trafodion/.bashrc at the end > > On Fri, Sep 18, 2015, 21:31 Suresh Subbiah <[email protected]> > wrote: > > > Thank you. This is a very useful find. Could you please explain where > > JAVA_TOOL_OPTIONS is set, will all userids see this variable ? > > In general I was under the impression that processes started through > > Trafodion monitor got their env settings from ms.env alone. For > > udrserv to see this setting, we may need to add it to ms.env (copy to > > all nodes then) and restart Trafdodion. Please do not do this till > > someone else confirms though as I am not certain. > > > > Thanks > > Suresh > > > > > > On Fri, Sep 18, 2015 at 12:09 PM, Radu Marias <[email protected]> > > wrote: > > > > > ok, it seems that it didn't last long, I've got again the java heap > > issue: > > > > > > # java -version > > > Picked up JAVA_TOOL_OPTIONS: -Xms128m -Xmx2048m Error occurred > > > during initialization of VM Could not reserve enough space for > > > object heap > > > Error: Could not create the Java Virtual Machine. > > > Error: A fatal exception has occurred. Program will exit. > > > > > > Still no crash in trafodion after those 30 minutes. > > > > > > On Fri, Sep 18, 2015 at 8:06 PM, Radu Marias <[email protected]> > > wrote: > > > > > > > I think I managed to fix the java -version heap issue with: > > > > *export JAVA_TOOL_OPTIONS="-Xms128m -Xmx2048m"* For processes that > > > > will specify explicit values for Xms and Xmx those > > > will > > > > override the JAVA_TOO_OPTIONS as described here > > > > > > > > > http://stackoverflow.com/questions/28327620/difference-between-java-op > > tions-java-tool-options-and-java-opts > > > > > > > > trafodion processes will take these values if they don't specify > > others, > > > I > > > > see lines like this when trafodion starts: > > > > Picked up JAVA_TOOL_OPTIONS: -Xms128m -Xmx2048m > > > > > > > > But in hammerdb when creating stored procedures I get this error: > > > > Error in Virtual User 1: Picked up JAVA_TOOL_OPTIONS: -Xms128m > > -Xmx2048m > > > > > > > > FIxed the hammerdb issue by manually creating the SPJ and indexes > > > > but > > at > > > > first run with 5 user I've got the crash from TRAFODION-1492. > > > > After restarted trafodion managed to run with 5 users for about 30 > > > > minutes. > > > Will > > > > do more tests on Monday. > > > > > > > > On Fri, Sep 18, 2015 at 6:05 PM, Suresh Subbiah < > > > > [email protected]> wrote: > > > > > > > >> Thank you. > > > >> > > > >> Trafodion-1492 can be used to track both the esp crash and the > > > >> udrserv crash. The fixes will be in different areas, but that does > > > >> not matter. > > > >> > > > >> I don't know much about containers or openVZ. Maybe others will > know. > > > Hope > > > >> to have a fix ready soon for the udrserv crash problem, in case > > > container > > > >> settings cannot be changed. > > > >> > > > >> The general idea suggested by Selva is that we introduce env > > > >> variables with min and max jvm heap size settings for the udrserv > > > >> process (just like > > we > > > >> have today for executor processes). Udrserv does have the idea of > > > reading > > > >> from a configuration file, so we could use that approach if that > > > >> is preferable. Either way there should be some way to start a > > > >> udrserv > > > process > > > >> with a smaller heap soon. > > > >> > > > >> Thanks > > > >> Suresh > > > >> > > > >> > > > >> > > > >> On Fri, Sep 18, 2015 at 8:17 AM, Radu Marias > > > >> <[email protected]> > > > >> wrote: > > > >> > > > >> > The nodes are in OpenVZ containers an I noticed this: > > > >> > > > > >> > # cat /proc/user_beancounters > > > >> > uid resource held > > > maxheld > > > >> > barrier limit failcnt > > > >> > * privvmpages 6202505 > 9436485 > > > >> > 9437184 9437184 1573* > > > >> > > > > >> > I assume this could be related to java -version issue. Trying > > > >> > to see > > > if > > > >> I > > > >> > can fix this, we are limited on what can be set from inside the > > > >> container. > > > >> > > > > >> > # cat /proc/user_beancounters > > > >> > Version: 2.5 > > > >> > uid resource held maxheld > > > >> > barrier limit failcnt > > > >> > 10045785: kmemsize 111794747 999153664 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > lockedpages 7970 7970 > > > >> > 6291456 6291456 0 > > > >> > * privvmpages 6202505 > 9436485 > > > >> > 9437184 9437184 1573* > > > >> > shmpages 34617 36553 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > dummy 0 0 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > numproc 952 1299 > > > >> > 30000 30000 0 > > > >> > physpages 1214672 6291456 > > > >> > 6291456 6291456 0 > > > >> > vmguarpages 0 0 > > > >> > 6291456 6291456 0 > > > >> > oomguarpages 1096587 2121834 > > > >> > 6291456 6291456 0 > > > >> > numtcpsock 226 457 > > > >> > 30000 30000 0 > > > >> > numflock 5 16 > > > >> > 1000 1100 0 > > > >> > numpty 4 6 > > > >> > 512 512 0 > > > >> > numsiginfo 1 69 > > > >> > 1024 1024 0 > > > >> > tcpsndbuf 5637456 17822864 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > tcprcvbuf 6061504 13730792 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > othersockbuf 46240 1268016 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > dgramrcvbuf 0 436104 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > numothersock 89 134 > > > >> > 30000 30000 0 > > > >> > dcachesize 61381173 935378121 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > numfile 7852 11005 > > > >> > 250000 250000 0 > > > >> > dummy 0 0 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > dummy 0 0 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > dummy 0 0 > > > >> > 9223372036854775807 9223372036854775807 0 > > > >> > numiptent 38 38 > > > >> > 1000 1000 0 > > > >> > > > > >> > > > > >> > On Fri, Sep 18, 2015 at 3:34 PM, Radu Marias > > > >> > <[email protected]> > > > >> wrote: > > > >> > > > > >> > > With 1 user no crash occurs, but on the node on which > > > >> > > hammerdb is > > > >> started > > > >> > > I noticed from time to time this: > > > >> > > > > > >> > > $ java -version > > > >> > > Error occurred during initialization of VM Unable to allocate > > > >> > > 199232KB bitmaps for parallel garbage > > collection > > > >> for > > > >> > > the requested 6375424KB heap. > > > >> > > Error: Could not create the Java Virtual Machine. > > > >> > > Error: A fatal exception has occurred. Program will exit. > > > >> > > > > > >> > > $ free -h > > > >> > > total used free shared buffers > > > >> cached > > > >> > > Mem: 24G 4.7G 19G 132M 0B > > > >> 314M > > > >> > > -/+ buffers/cache: 4.4G 19G > > > >> > > Swap: 0B 0B 0B > > > >> > > > > > >> > > > > > >> > > On Fri, Sep 18, 2015 at 2:21 PM, Radu Marias < > > [email protected]> > > > >> > wrote: > > > >> > > > > > >> > >> $ $JAVA_HOME/bin/java -XX:+PrintFlagsFinal -version | grep > > HeapSize > > > >> > >> uintx ErgoHeapSizeLimit = 0 > > > >> > >> {product} > > > >> > >> uintx HeapSizePerGCThread = 87241520 > > > >> > >> {product} > > > >> > >> uintx InitialHeapSize := 402653184 > > > >> > >> {product} > > > >> > >> uintx LargePageHeapSizeThreshold = 134217728 > > > >> > >> {product} > > > >> > >> uintx MaxHeapSize := > 6442450944 > > > >> > >> {product} > > > >> > >> java version "1.7.0_67" > > > >> > >> Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java > > > >> > >> HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) > > > >> > >> > > > >> > >> > > > >> > >> On Fri, Sep 18, 2015 at 2:20 PM, Radu Marias < > > [email protected] > > > > > > > >> > >> wrote: > > > >> > >> > > > >> > >>> I've logged this issue several days ago, is this ok? > > > >> > >>> https://issues.apache.org/jira/browse/TRAFODION-1492 > > > >> > >>> > > > >> > >>> Will try with one user and let you know. > > > >> > >>> > > > >> > >>> On Fri, Sep 18, 2015 at 7:05 AM, Suresh Subbiah < > > > >> > >>> [email protected]> wrote: > > > >> > >>> > > > >> > >>>> Hi > > > >> > >>>> > > > >> > >>>> How many Virtual users are being used? If it is more than > > > >> > >>>> one > > > >> could we > > > >> > >>>> please try the case with 1 user first. > > > >> > >>>> > > > >> > >>>> When the crash happens next time could we please try sqps > > > >> > >>>> | grep esp | wc -l > > > >> > >>>> > > > >> > >>>> If this number is large we know a lot of esp processes are > > being > > > >> > started > > > >> > >>>> which could consume memory. > > > >> > >>>> If this is the case please insert this row into the > > > >> > >>>> defaults > > > table > > > >> > from > > > >> > >>>> sqlci and the restart dcs (dcsstop followed by dcsstart) > > > >> > >>>> insert into "_MD_".defaults > > > >> > >>>> values('ATTEMPT_ESP_PARALLELISM', > > > >> 'OFF', > > > >> > >>>> 'hammerdb testing') ; > > > >> > >>>> exit ; > > > >> > >>>> > > > >> > >>>> I will work having the udr process create a JVM with a > > > >> > >>>> smaller > > > >> initial > > > >> > >>>> heap > > > >> > >>>> size. If you have time and would like to do so a, JIRA you > > > >> > >>>> file > > > >> will > > > >> > be > > > >> > >>>> helpful. Or I can file the JIRA and work on it. It will > > > >> > >>>> not > > take > > > >> long > > > >> > to > > > >> > >>>> make this change. > > > >> > >>>> > > > >> > >>>> Thanks > > > >> > >>>> Suresh > > > >> > >>>> > > > >> > >>>> PS I found this command from stackOverflow to determine > > > >> > >>>> the initialHeapSize we get by default in this env > > > >> > >>>> > > > >> > >>>> java -XX:+PrintFlagsFinal -version | grep HeapSize > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > > > > >> > > > > > http://stackoverflow.com/questions/4667483/how-is-the-default-java-hea > > p-size-determined > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> On Thu, Sep 17, 2015 at 10:32 AM, Radu Marias < > > > >> [email protected]> > > > >> > >>>> wrote: > > > >> > >>>> > > > >> > >>>> > Did the steps mentioned above to ensure that the > > > >> > >>>> > trafodion > > > >> processes > > > >> > >>>> are > > > >> > >>>> > free of JAVA installation mixup. > > > >> > >>>> > Also changed so that hdp, trafodion and hammerdb uses > > > >> > >>>> > the > > same > > > >> jdk > > > >> > >>>> from > > > >> > >>>> > */usr/jdk64/jdk1.7.0_67* > > > >> > >>>> > > > > >> > >>>> > # java -version > > > >> > >>>> > java version "1.7.0_67" > > > >> > >>>> > Java(TM) SE Runtime Environment (build 1.7.0_67-b01) > > > >> > >>>> > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, > > > >> > >>>> > mixed > > mode) > > > >> > >>>> > > > > >> > >>>> > # echo $JAVA_HOME > > > >> > >>>> > /usr/jdk64/jdk1.7.0_67 > > > >> > >>>> > > > > >> > >>>> > But when running hammerdb I got again a crash on 2 > > > >> > >>>> > nodes. I > > > >> noticed > > > >> > >>>> that > > > >> > >>>> > before the crash for about one minute I'm getting errors > > > >> > >>>> > for > > > >> *java > > > >> > >>>> > -version* and > > > >> > >>>> > about 30 seconds after the crash the java -version > > > >> > >>>> > worked > > > again. > > > >> So > > > >> > >>>> these > > > >> > >>>> > issues might be related. Didn't yet found the problem > > > >> > >>>> > and how > > > to > > > >> fix > > > >> > >>>> the > > > >> > >>>> > java -version issue. > > > >> > >>>> > > > > >> > >>>> > # java -version > > > >> > >>>> > Error occurred during initialization of VM Could not > > > >> > >>>> > reserve enough space for object heap > > > >> > >>>> > Error: Could not create the Java Virtual Machine. > > > >> > >>>> > Error: A fatal exception has occurred. Program will exit > > > >> > >>>> > > > > >> > >>>> > # file core.5813 > > > >> > >>>> > core.5813: ELF 64-bit LSB core file x86-64, version 1 > > > >> > >>>> > (SYSV), > > > >> > >>>> SVR4-style, > > > >> > >>>> > from 'tdm_udrserv SQMON1.1 00000 00000 005813 $Z0004R3 > > > >> > >>>> > 188.138.61.175:48357 > > > >> > >>>> > 00004 000' > > > >> > >>>> > > > > >> > >>>> > #0 0x00007f6920ba0625 in raise () from /lib64/libc.so.6 > > > >> > >>>> > #1 0x00007f6920ba1e05 in abort () from /lib64/libc.so.6 > > > >> > >>>> > #2 0x0000000000424369 in comTFDS (msg1=0x43c070 > > > >> > >>>> > "Trafodion > > UDR > > > >> > Server > > > >> > >>>> > Internal Error", msg2=<value optimized out>, > > > msg3=0x7fff119787f0 > > > >> > >>>> "Source > > > >> > >>>> > file information unavailable", > > > >> > >>>> > msg4=0x7fff11977ff0 "User routine being processed : > > > >> > >>>> > TRAFODION.TPCC.NEWORDER, Routine Type : Stored > > > >> > >>>> > Procedure, > > > >> Language > > > >> > >>>> Type : > > > >> > >>>> > JAVA, Error occurred outside the user routine code", > > > >> msg5=0x43ddc3 > > > >> > "", > > > >> > >>>> > dialOut=<value optimized out>, writeToSeaLog=1) at > > > >> > >>>> > ../udrserv/UdrFFDC.cpp:191 > > > >> > >>>> > #3 0x00000000004245d7 in makeTFDSCall > > > >> > >>>> > (msg=0x7f692324b310 > > "The > > > >> Java > > > >> > >>>> > virtual machine aborted", file=<value optimized out>, > > > line=<value > > > >> > >>>> optimized > > > >> > >>>> > out>, dialOut=1, writeToSeaLog=1) at > > ../udrserv/UdrFFDC.cpp:219 > > > >> > >>>> > #4 0x00007f69232316b8 in LmJavaHooks::abortHookJVM () > > > >> > >>>> > at > > > >> > >>>> > ../langman/LmJavaHooks.cpp:54 > > > >> > >>>> > #5 0x00007f69229cbbc6 in > > > >> > >>>> > ParallelScavengeHeap::initialize() > > () > > > >> from > > > >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so > > > >> > >>>> > #6 0x00007f6922afedba in Universe::initialize_heap() () > > > >> > >>>> > from > > > >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so > > > >> > >>>> > #7 0x00007f6922afff89 in universe_init() () from > > > >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so > > > >> > >>>> > #8 0x00007f692273d9f5 in init_globals() () from > > > >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so > > > >> > >>>> > #9 0x00007f6922ae78ed in > > > >> > >>>> > Threads::create_vm(JavaVMInitArgs*, > > > >> bool*) > > > >> > >>>> () > > > >> > >>>> > from > > > >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so > > > >> > >>>> > #10 0x00007f69227c5a34 in JNI_CreateJavaVM () from > > > >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so > > > >> > >>>> > #11 0x00007f692322de51 in > > > >> > >>>> > LmLanguageManagerJava::initialize > > > >> > >>>> (this=<value > > > >> > >>>> > optimized out>, result=<value optimized out>, > > maxLMJava=<value > > > >> > >>>> optimized > > > >> > >>>> > out>, userOptions=0x7f69239ba418, diagsArea=<value > > > >> > >>>> > out>optimized > > > >> out>) > > > >> > at > > > >> > >>>> > ../langman/LmLangManagerJava.cpp:379 > > > >> > >>>> > #12 0x00007f692322f564 in > > > >> > LmLanguageManagerJava::LmLanguageManagerJava > > > >> > >>>> > (this=0x7f69239bec38, result=@0x7fff1197e19c, > > > >> commandLineMode=<value > > > >> > >>>> > optimized out>, maxLMJava=1, userOptions=0x7f69239ba418, > > > >> > >>>> > diagsArea=0x7f6923991780) at > > > ../langman/LmLangManagerJava.cpp:155 > > > >> > >>>> > #13 0x0000000000425619 in UdrGlobals::getOrCreateJavaLM > > > >> > >>>> > (this=0x7f69239ba040, result=@0x7fff1197e19c, > > > >> > >>>> > diags=<value > > > >> optimized > > > >> > >>>> out>) > > > >> > >>>> > at ../udrserv/udrglobals.cpp:322 > > > >> > >>>> > #14 0x0000000000427328 in processALoadMessage > > > >> > (UdrGlob=0x7f69239ba040, > > > >> > >>>> > msgStream=..., request=..., env=<value optimized out>) > > > >> > >>>> > at > > > >> > >>>> > ../udrserv/udrload.cpp:163 > > > >> > >>>> > #15 0x000000000042fbfd in processARequest > > > >> (UdrGlob=0x7f69239ba040, > > > >> > >>>> > msgStream=..., env=...) at ../udrserv/udrserv.cpp:660 > > > >> > >>>> > #16 0x000000000043269c in runServer (argc=2, > > > >> argv=0x7fff1197e528) at > > > >> > >>>> > ../udrserv/udrserv.cpp:520 > > > >> > >>>> > #17 0x000000000043294e in main (argc=2, > > > >> > >>>> > argv=0x7fff1197e528) > > at > > > >> > >>>> > ../udrserv/udrserv.cpp:356 > > > >> > >>>> > > > > >> > >>>> > On Wed, Sep 16, 2015 at 6:03 PM, Suresh Subbiah < > > > >> > >>>> > [email protected]> > > > >> > >>>> > wrote: > > > >> > >>>> > > > > >> > >>>> > > Hi, > > > >> > >>>> > > > > > >> > >>>> > > I have added a wiki page that describes how to get a > > > >> > >>>> > > stack > > > >> trace > > > >> > >>>> from a > > > >> > >>>> > > core file. The page could do with some improvements on > > > finding > > > >> the > > > >> > >>>> core > > > >> > >>>> > > file and maybe even doing more than getting thestack > > > >> > >>>> > > trace. > > > For > > > >> > now > > > >> > >>>> it > > > >> > >>>> > > should make our troubleshooting cycle faster if the > > > >> > >>>> > > stack > > > >> trace is > > > >> > >>>> > included > > > >> > >>>> > > in the initial message itself. > > > >> > >>>> > > > > > >> > >>>> > > > > > >> > >>>> > > > > >> > >>>> > > > >> > > > > >> > > > > > https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+tra > > ce+from+a+core+file > > > >> > >>>> > > > > > >> > >>>> > > In this case, the last node does not seem to have gdb, > > > >> > >>>> > > so I > > > >> could > > > >> > >>>> not see > > > >> > >>>> > > the trace there. I moved the core file to the first > > > >> > >>>> > > node > > but > > > >> then > > > >> > >>>> the > > > >> > >>>> > trace > > > >> > >>>> > > looks like this. I assume this is because I moved the > > > >> > >>>> > > core > > > file > > > >> > to a > > > >> > >>>> > > different node. I think Selva's suggestion is good to > try. > > We > > > >> may > > > >> > >>>> have > > > >> > >>>> > had > > > >> > >>>> > > a few tdm_udrserv processes from before the time the > > > >> > >>>> > > java > > > >> change > > > >> > >>>> was > > > >> > >>>> > made. > > > >> > >>>> > > > > > >> > >>>> > > $ gdb tdm_udrserv core.49256 > > > >> > >>>> > > #0 0x00007fe187a674fe in __longjmp () from > > /lib64/libc.so.6 > > > >> > >>>> > > #1 0x8857780a58ff2155 in ?? () Cannot access memory > > > >> > >>>> > > at address 0x8857780a58ff2155 > > > >> > >>>> > > > > > >> > >>>> > > The back trace we saw yesterday when a udrserv process > > exited > > > >> when > > > >> > >>>> JVM > > > >> > >>>> > > could not be started is used in the wiki page instead > > > >> > >>>> > > of > > this > > > >> one. > > > >> > >>>> If you > > > >> > >>>> > > have time a JIRA on this unexpected udrserv exit will > > > >> > >>>> > > also > > be > > > >> > >>>> valuable > > > >> > >>>> > for > > > >> > >>>> > > the Trafodion team. > > > >> > >>>> > > > > > >> > >>>> > > Thanks > > > >> > >>>> > > Suresh > > > >> > >>>> > > > > > >> > >>>> > > On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan < > > > >> > >>>> > > [email protected]> wrote: > > > >> > >>>> > > > > > >> > >>>> > > > Thanks for creating the JIRA Trafodion-1492. The > > > >> > >>>> > > > error > > is > > > >> > >>>> similar to > > > >> > >>>> > > > scenario-2. The process tdm_udrserv dumped core. We > > > >> > >>>> > > > will > > > look > > > >> > >>>> into the > > > >> > >>>> > > core > > > >> > >>>> > > > file. In the meantime, can you please do the following: > > > >> > >>>> > > > > > > >> > >>>> > > > Bring the Trafodion instance down echo $MY_SQROOT -- > > > >> > >>>> > > > shows Trafodion installation directory Remove > > > >> > >>>> > > > $MY_SQROOT/etc/ms.env from all nodes > > > >> > >>>> > > > > > > >> > >>>> > > > > > > >> > >>>> > > > Start a New Terminal Session so that new Java > > > >> > >>>> > > > settings > > are > > > in > > > >> > >>>> place > > > >> > >>>> > > > Login as a Trafodion user cd > > > >> > >>>> > > > <trafodion_installation_directory> > > > >> > >>>> > > > . ./sqenv.sh (skip this if it is done automatically > > > >> > >>>> > > > upon > > > >> logon) > > > >> > >>>> > > > sqgen > > > >> > >>>> > > > > > > >> > >>>> > > > Exit and Start a New Terminal Session Restart the > > > >> > >>>> > > > Trafodion instance and check if you are > > seeing > > > >> the > > > >> > >>>> issue > > > >> > >>>> > with > > > >> > >>>> > > > tdm_udrserv again. We wanted to ensure that the > > > >> > >>>> > > > trafodion > > > >> > >>>> processes are > > > >> > >>>> > > > free > > > >> > >>>> > > > of JAVA installation mixup in your earlier message. > > > >> > >>>> > > > We > > > >> suspect > > > >> > >>>> that can > > > >> > >>>> > > > cause tdm_udrserv process to dump core. > > > >> > >>>> > > > > > > >> > >>>> > > > > > > >> > >>>> > > > Selva > > > >> > >>>> > > > > > > >> > >>>> > > > -----Original Message----- > > > >> > >>>> > > > From: Radu Marias [mailto:[email protected]] > > > >> > >>>> > > > Sent: Wednesday, September 16, 2015 5:40 AM > > > >> > >>>> > > > To: dev <[email protected]> > > > >> > >>>> > > > Subject: Re: odbc and/or hammerdb logs > > > >> > >>>> > > > > > > >> > >>>> > > > I'm seeing this in hammerdb logs, I assume is due to > > > >> > >>>> > > > the > > > >> crash > > > >> > >>>> and some > > > >> > >>>> > > > processes are stopped: > > > >> > >>>> > > > > > > >> > >>>> > > > Error in Virtual User 1: [Trafodion ODBC > > Driver][Trafodion > > > >> > >>>> Database] > > > >> > >>>> > SQL > > > >> > >>>> > > > ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system > > > >> > >>>> > > > error > > > 201 > > > >> > >>>> while > > > >> > >>>> > > > communicating with server process $Z010LPE:23. > > [2015-09-16 > > > >> > >>>> 12:35:33] > > > >> > >>>> > > > [Trafodion ODBC Driver][Trafodion Database] SQL > > > >> > >>>> > > > ERROR:*** > > > >> > >>>> ERROR[8904] > > > >> > >>>> > SQL > > > >> > >>>> > > > did not receive a reply from MXUDR, possibly caused > > > >> > >>>> > > > by > > > >> internal > > > >> > >>>> errors > > > >> > >>>> > > when > > > >> > >>>> > > > executing user-defined routines. [2015-09-16 > > > >> > >>>> > > > 12:35:33] > > > >> > >>>> > > > > > > >> > >>>> > > > $ sqcheck > > > >> > >>>> > > > Checking if processes are up. > > > >> > >>>> > > > Checking attempt: 1; user specified max: 2. > > > >> > >>>> > > > Execution > > time > > > in > > > >> > >>>> seconds: > > > >> > >>>> > 0. > > > >> > >>>> > > > > > > >> > >>>> > > > The SQ environment is up! > > > >> > >>>> > > > > > > >> > >>>> > > > > > > >> > >>>> > > > Process Configured Actual Down > > > >> > >>>> > > > ------- ---------- ------ ---- > > > >> > >>>> > > > DTM 5 5 > > > >> > >>>> > > > RMS 10 10 > > > >> > >>>> > > > MXOSRVR 20 20 > > > >> > >>>> > > > > > > >> > >>>> > > > On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias < > > > >> > >>>> [email protected]> > > > >> > >>>> > > wrote: > > > >> > >>>> > > > > > > >> > >>>> > > > > I've restarted hdp and trafodion and now I managed > > > >> > >>>> > > > > to > > > >> create > > > >> > the > > > >> > >>>> > > > > schema and stored procedures from hammerdb. But > > > >> > >>>> > > > > I'm > > > getting > > > >> > >>>> fails and > > > >> > >>>> > > > > dump core again by trafodion while running virtual > > users. > > > >> For > > > >> > >>>> some of > > > >> > >>>> > > > > the users I sometimes see in hammerdb logs: > > > >> > >>>> > > > > Vuser 5:Failed to execute payment Vuser 5:Failed > > > >> > >>>> > > > > to execute stock level Vuser 5:Failed to execute > > > >> > >>>> > > > > new order > > > >> > >>>> > > > > > > > >> > >>>> > > > > Core files are on out last node, feel free to > > > >> > >>>> > > > > examine > > > them, > > > >> > the > > > >> > >>>> files > > > >> > >>>> > > > > were dumped while getting hammerdb errors: > > > >> > >>>> > > > > > > > >> > >>>> > > > > *core.49256* > > > >> > >>>> > > > > > > > >> > >>>> > > > > *core.48633* > > > >> > >>>> > > > > > > > >> > >>>> > > > > *core.49290* > > > >> > >>>> > > > > > > > >> > >>>> > > > > > > > >> > >>>> > > > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias < > > > >> > >>>> [email protected]> > > > >> > >>>> > > > wrote: > > > >> > >>>> > > > > > > > >> > >>>> > > > >> *Scenario 1:* > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> I've created this issue > > > >> > >>>> > > > >> https://issues.apache.org/jira/browse/TRAFODION-1 > > > >> > >>>> > > > >> 492 I think another fix was made related to > > > >> > >>>> > > > >> *Committed_AS* > > > in > > > >> > >>>> > > > >> *sql/cli/memmonitor.cpp*. > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> This is a response from Narendra in a previous > > > >> > >>>> > > > >> thread > > > >> where > > > >> > the > > > >> > >>>> > issue > > > >> > >>>> > > > >> was fixed to start the trafodion: > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> *I updated the code: sql/cli/memmonitor.cpp, so > > > >> > >>>> > > > >>> that > > if > > > >> > >>>> > > > >>> /proc/meminfo does not have the ‘Committed_AS’ > > > >> > >>>> > > > >>> entry, > > > it > > > >> > will > > > >> > >>>> > ignore > > > >> > >>>> > > > >>> it. Built it and put the binary: libcli.so on > > > >> > >>>> > > > >>> the > > > >> veracity > > > >> > >>>> box (in > > > >> > >>>> > > > >>> the $MY_SQROOT/export/lib64 directory – on all > > > >> > >>>> > > > >>> the > > > >> nodes). > > > >> > >>>> > Restarted > > > >> > >>>> > > > the > > > >> > >>>> > > > >>> env and ‘sqlci’ worked fine. > > > >> > >>>> > > > >>> Was able to ‘initialize trafodion’ and create a > > table.* > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> *Scenario 2:* > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> The *java -version* problem I recall we had only > > > >> > >>>> > > > >> on > > the > > > >> other > > > >> > >>>> > cluster > > > >> > >>>> > > > >> with centos 7, I did't seen it on this one with > > > >> > >>>> > > > >> centos > > > >> 6.7. > > > >> > >>>> But a > > > >> > >>>> > > > >> change I made these days in the latter one is > > installing > > > >> > >>>> oracle *jdk > > > >> > >>>> > > > >> 1.7.0_79* as default one and is where *JAVA_HOME* > > points > > > >> to. > > > >> > >>>> Before > > > >> > >>>> > > > >> that some nodes had *open-jdk* as default and > > > >> > >>>> > > > >> others > > > >> didn't > > > >> > >>>> have one > > > >> > >>>> > > > >> but just the one installed by path by *ambari* in > > > >> > >>>> > > > >> */usr/jdk64/jdk1.7.0_67* but which was not linked > > > >> > >>>> > > > >> to > > > >> > JAVA_HOME > > > >> > >>>> or > > > >> > >>>> > > *java* > > > >> > >>>> > > > >> command by *alternatives*. > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> *Failures is HammerDB:* > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> Attached is the *trafodion.dtm.**log* from a node > > > >> > >>>> > > > >> on > > > >> which I > > > >> > >>>> see a > > > >> > >>>> > > > >> lot of lines like these and I assume is the > > *transaction > > > >> > >>>> conflict* > > > >> > >>>> > > > >> that you mentioned, I see these line on 4 out of > > > >> > >>>> > > > >> 5 > > > nodes: > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient: > > > >> useForgotten > > > >> > >>>> is true > > > >> > >>>> > > > >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient: > > > >> > forceForgotten > > > >> > >>>> is > > > >> > >>>> > > > >> false > > > >> > >>>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: > > > >> > >>>> forceControlPoint is > > > >> > >>>> > > > >> false > > > >> > >>>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: > > > >> useAutoFlush is > > > >> > >>>> false > > > >> > >>>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: > > > >> ageCommitted is > > > >> > >>>> false > > > >> > >>>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: > > > >> > >>>> disableBlockCache is > > > >> > >>>> > > > >> false > > > >> > >>>> > > > >> 2015-09-14 12:21:52,229 INFO > > dtm.HBaseAuditControlPoint: > > > >> > >>>> > > > >> disableBlockCache is false > > > >> > >>>> > > > >> 2015-09-14 12:21:52,233 INFO > > dtm.HBaseAuditControlPoint: > > > >> > >>>> > useAutoFlush > > > >> > >>>> > > > >> is false > > > >> > >>>> > > > >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: > > > >> > >>>> > > > >> Exit > > > >> > >>>> RET_HASCONFLICT > > > >> > >>>> > > > >> prepareCommit, txid: 17179989222 > > > >> > >>>> > > > >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: > > > >> > >>>> > > > >> Exit > > > >> > >>>> RET_HASCONFLICT > > > >> > >>>> > > > >> prepareCommit, txid: 17179989277 > > > >> > >>>> > > > >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: > > > >> > >>>> > > > >> Exit > > > >> > >>>> RET_HASCONFLICT > > > >> > >>>> > > > >> prepareCommit, txid: 17179989309 > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> What *transaction conflict* means in this case? > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> On Wed, Sep 16, 2015 at 2:43 AM, Selva > > > >> > >>>> > > > >> Govindarajan < [email protected]> wrote: > > > >> > >>>> > > > >> > > > >> > >>>> > > > >>> Hi Radu, > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> Thanks for using Trafodion. With the help from > > Suresh, > > > we > > > >> > >>>> looked at > > > >> > >>>> > > > >>> the core files in your cluster. We believe that > > > >> > >>>> > > > >>> there > > > are > > > >> > two > > > >> > >>>> > > > >>> scenarios that is causing the Trafodion > > > >> > >>>> > > > >>> processes to > > > dump > > > >> > >>>> core. > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> Scenario 1: > > > >> > >>>> > > > >>> Core dumped by tdm_arkesp processes. Trafodion > > > >> > >>>> > > > >>> engine > > > has > > > >> > >>>> assumed > > > >> > >>>> > > > >>> the entity /proc/meminfo/Committed_AS is > > > >> > >>>> > > > >>> available in > > > all > > > >> > >>>> flavors > > > >> > >>>> > of > > > >> > >>>> > > > >>> linux. The absence of this entity is not > > > >> > >>>> > > > >>> handled > > > >> correctly > > > >> > >>>> by the > > > >> > >>>> > > > >>> trafodion tdm_arkesp process and hence it dumped > > core. > > > >> > Please > > > >> > >>>> file > > > >> > >>>> > a > > > >> > >>>> > > > >>> JIRA using this link > > > >> > >>>> > > > >>> > > > >> > >>>> https://issues.apache.org/jira/secure/CreateIssue!default. > > > >> > >>>> jspa > > > and > > > >> > >>>> > > > >>> choose "Apache Trafodion" as the project to > > > >> > >>>> > > > >>> report a > > > bug > > > >> > >>>> against. > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> Scenario 2: > > > >> > >>>> > > > >>> Core dumped by tdm_udrserv processes. From our > > > analysis, > > > >> > this > > > >> > >>>> > > > >>> problem happened when the process attempted to > > > >> > >>>> > > > >>> create > > > the > > > >> > JVM > > > >> > >>>> > > > >>> instance programmatically. Few days earlier, we > > > >> > >>>> > > > >>> have > > > >> > observed > > > >> > >>>> > > > >>> similar issue in your cluster when java -version > > > command > > > >> was > > > >> > >>>> > > > >>> attempted. But, java -version or > > > >> > >>>> > > > >>> $JAVA_HOME/bin/java > > > >> > -version > > > >> > >>>> works > > > >> > >>>> > > > >>> fine now. > > > >> > >>>> > > > >>> Was there any change made to the cluster > > > >> > >>>> > > > >>> recently to > > > >> avoid > > > >> > the > > > >> > >>>> > > > >>> problem with java -version command? > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> You can please delete all the core files in > > sql/scripts > > > >> > >>>> directory > > > >> > >>>> > > > >>> and issue the command to invoke SPJ and check if > > > >> > >>>> > > > >>> it > > > still > > > >> > >>>> dumps > > > >> > >>>> > > > >>> core. We can look at the core file if it happens > > again. > > > >> Your > > > >> > >>>> > > > >>> solution to the java -version command would be > > helpful. > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> For the failures with HammerDB, can you please > > > >> > >>>> > > > >>> send > > us > > > >> the > > > >> > >>>> exact > > > >> > >>>> > > > >>> error message returned by the Trafodion engine > > > >> > >>>> > > > >>> to the > > > >> > >>>> application. > > > >> > >>>> > > > >>> This might help us to narrow down the cause. You > > > >> > >>>> > > > >>> can > > > also > > > >> > >>>> look at > > > >> > >>>> > > > >>> $MY_SQROOT/logs/trafodion.dtm.log to check if > > > >> > >>>> > > > >>> any > > > >> > transaction > > > >> > >>>> > > > >>> conflict is causing this error. > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> Selva > > > >> > >>>> > > > >>> -----Original Message----- > > > >> > >>>> > > > >>> From: Radu Marias [mailto:[email protected]] > > > >> > >>>> > > > >>> Sent: Tuesday, September 15, 2015 9:09 AM > > > >> > >>>> > > > >>> To: dev <[email protected]> > > > >> > >>>> > > > >>> Subject: Re: odbc and/or hammerdb logs > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> Also noticed there are several core. files from > > > >> > >>>> > > > >>> today > > > in > > > >> > >>>> > > > >>> > > */home/trafodion/trafodion-20150828_0830/sql/scripts*. > > > If > > > >> > >>>> needed > > > >> > >>>> > > > >>> please provide a gmail address so I can share > > > >> > >>>> > > > >>> them > > via > > > >> > gdrive. > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias < > > > >> > >>>> [email protected] > > > >> > >>>> > > > > > >> > >>>> > > > >>> wrote: > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> > Hi, > > > >> > >>>> > > > >>> > > > > >> > >>>> > > > >>> > I'm running HammerDB over trafodion and when > > running > > > >> > virtual > > > >> > >>>> > users > > > >> > >>>> > > > >>> > sometimes I get errors like this in hammerdb > logs: > > > >> > >>>> > > > >>> > *Vuser 1:Failed to execute payment* > > > >> > >>>> > > > >>> > > > > >> > >>>> > > > >>> > *Vuser 1:Failed to execute new order* > > > >> > >>>> > > > >>> > > > > >> > >>>> > > > >>> > I'm using unixODBC and I tried to add these > > > >> > >>>> > > > >>> > line in > > > >> > >>>> > > > >>> > */etc/odbc.ini* but the trace file is not > created. > > > >> > >>>> > > > >>> > *[ODBC]* > > > >> > >>>> > > > >>> > *Trace = 1* > > > >> > >>>> > > > >>> > *TraceFile = /var/log/odbc_tracefile.log* > > > >> > >>>> > > > >>> > > > > >> > >>>> > > > >>> > Also tried with *Trace = yes* and *Trace = > > > >> > >>>> > > > >>> > on*, > > I've > > > >> found > > > >> > >>>> > > > >>> > multiple references for both. > > > >> > >>>> > > > >>> > > > > >> > >>>> > > > >>> > How can I see more logs to debug the issue? > > > >> > >>>> > > > >>> > Can I > > > >> enable > > > >> > >>>> logs for > > > >> > >>>> > > > >>> > all queries in trafodion? > > > >> > >>>> > > > >>> > > > > >> > >>>> > > > >>> > -- > > > >> > >>>> > > > >>> > And in the end, it's not the years in your > > > >> > >>>> > > > >>> > life > > that > > > >> > count. > > > >> > >>>> It's > > > >> > >>>> > > > >>> > the life in your years. > > > >> > >>>> > > > >>> > > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >>> -- > > > >> > >>>> > > > >>> And in the end, it's not the years in your life > > > >> > >>>> > > > >>> that > > > >> count. > > > >> > >>>> It's > > > >> > >>>> > the > > > >> > >>>> > > > >>> life in your years. > > > >> > >>>> > > > >>> > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> > > > >> > >>>> > > > >> -- > > > >> > >>>> > > > >> And in the end, it's not the years in your life > > > >> > >>>> > > > >> that > > > >> count. > > > >> > >>>> It's the > > > >> > >>>> > > > life > > > >> > >>>> > > > >> in your years. > > > >> > >>>> > > > >> > > > >> > >>>> > > > > > > > >> > >>>> > > > > > > > >> > >>>> > > > > > > > >> > >>>> > > > > -- > > > >> > >>>> > > > > And in the end, it's not the years in your life > > > >> > >>>> > > > > that > > > count. > > > >> > >>>> It's the > > > >> > >>>> > > life > > > >> > >>>> > > > > in your years. > > > >> > >>>> > > > > > > > >> > >>>> > > > > > > >> > >>>> > > > > > > >> > >>>> > > > > > > >> > >>>> > > > -- > > > >> > >>>> > > > And in the end, it's not the years in your life that > > count. > > > >> It's > > > >> > >>>> the > > > >> > >>>> > life > > > >> > >>>> > > > in your years. > > > >> > >>>> > > > > > > >> > >>>> > > > > > >> > >>>> > > > > >> > >>>> > > > > >> > >>>> > > > > >> > >>>> > -- > > > >> > >>>> > And in the end, it's not the years in your life that count. > > > It's > > > >> the > > > >> > >>>> life > > > >> > >>>> > in your years. > > > >> > >>>> > > > > >> > >>>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> -- > > > >> > >>> And in the end, it's not the years in your life that count. > > > >> > >>> It's > > > the > > > >> > >>> life in your years. > > > >> > >>> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> -- > > > >> > >> And in the end, it's not the years in your life that count. > > > >> > >> It's > > > the > > > >> > life > > > >> > >> in your years. > > > >> > >> > > > >> > > > > > >> > > > > > >> > > > > > >> > > -- > > > >> > > And in the end, it's not the years in your life that count. > > > >> > > It's > > the > > > >> life > > > >> > > in your years. > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > And in the end, it's not the years in your life that count. > > > >> > It's the > > > >> life > > > >> > in your years. > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > And in the end, it's not the years in your life that count. It's > > > > the > > life > > > > in your years. > > > > > > > > > > > > > > > > -- > > > And in the end, it's not the years in your life that count. It's the > > > life in your years. > > > > > > > -- And in the end, it's not the years in your life that count. It's the life in your years.
