[ https://issues.apache.org/jira/browse/DERBY-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rick Hillegas updated DERBY-7007: --------------------------------- Urgency: Normal (was: Blocker) > Random IOException: Bad file descriptor on new server platform > -------------------------------------------------------------- > > Key: DERBY-7007 > URL: https://issues.apache.org/jira/browse/DERBY-7007 > Project: Derby > Issue Type: Bug > Components: Miscellaneous > Affects Versions: 10.12.1.1 > Environment: Linux: SUSE Linux Enterprise Server for SAP Applications > 12 SP3 (x86_64) > Kernel: 4.4.126-94.22-default #1 SMP Wed Apr 11 07:45:03 UTC 2018 (9649989) > x86_64 x86_64 x86_64 GNU/Linux > Filesystem: /dev/mapper/appsvg-lvapps on /opt/apps type ext3 > (rw,relatime,data=ordered) > Java: IBM 7.1-4.15 > Tomcat: 7.0.85 > Reporter: Ralf Schubert > Priority: Blocker > > Our customer is migrating to a new server platform. We have running several > applications on their old server platform right now, which are running well > so far. But on the new platform some random Derby errors occur reproducably > which we and customer are analysing since several months now. However, the > deeper we get the more clueless we are and it looks more and more like a > DERBY bug. > We would be pleased if somebody could look into this and give us some idea if > this is either a bug in derby or if you have some other ideas what could > cause derby to behave like this. > h2. Situation > We have one Application which includes several embedded DERBY databases. > After the server is starting, the application behaves normal for a few > minutes. But after some minutes, one of the Derby DBs (accessed by JAVA > Hibernate using DERBY embedded mode) shows first an error like this on a > random derby file (the files vary each time): > {code:java} > Local derby log (/home/tomcat_i36/derby.log): > > ------------ Begin Shutdown Error Stack ------------- > ERROR XSDG3: Meta-data for Container(0, 33904) could not be accessed to clean > /opt/apps/tomcat/i36/webapps/XXXXX/database/XX/seg0/c8470.dat > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown > Source) > at > org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(Unknown > Source) > at > org.apache.derby.impl.services.cache.ConcurrentCache.cleanEntry(Unknown > Source) > at > org.apache.derby.impl.services.cache.BackgroundCleaner.performWork(Unknown > Source) > at > org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown > Source) > at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown > Source) > at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:809) > Caused by: java.io.IOException: Bad file descriptor > at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) > at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:65) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) > at sun.nio.ch.IOUtil.read(IOUtil.java:210) > at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:754) > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:739) > at > org.apache.derby.impl.store.raw.data.RAFContainer4.readFull(Unknown Source) > at > org.apache.derby.impl.store.raw.data.RAFContainer4.readPage0(Unknown Source) > at > org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source) > at > org.apache.derby.impl.store.raw.data.RAFContainer4.getEmbryonicPage(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown > Source) > ... 8 more > ============= begin nested exception, level (1) =========== > java.io.IOException: Bad file descriptor > at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) > at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:65) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) > at sun.nio.ch.IOUtil.read(IOUtil.java:210) > at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:754) > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:739) > at > org.apache.derby.impl.store.raw.data.RAFContainer4.readFull(Unknown Source) > at > org.apache.derby.impl.store.raw.data.RAFContainer4.readPage0(Unknown Source) > at > org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source) > at > org.apache.derby.impl.store.raw.data.RAFContainer4.getEmbryonicPage(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown > Source) > at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown > Source) > at > org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(Unknown > Source) > at > org.apache.derby.impl.services.cache.ConcurrentCache.cleanEntry(Unknown > Source) > at > org.apache.derby.impl.services.cache.BackgroundCleaner.performWork(Unknown > Source) > at > org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown > Source) > at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown > Source) > at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:809) > ============= end nested exception, level (1) =========== > ------------ End Shutdown Error Stack ------------{code} > After this happens, the DB behaves weird, throwing random errors (e.g. > telling a column is missing in a table although it is there, or telling the > DB is corrupt). > Hint: We do only have READ access on those databases within the application. > We do not write any data to it. > It only happens to one single DB, but this is the most complex one in the > application. Restarting the server will make it WORK for some minutes again! > We deploy the exact same WAR file to Old and new platform for testing. > h2. Already analysed > We already tried several things and did several analysis steps: > # Turning off antivirus solution (Trend Micro Deep Security Agent) did not > help > # Exchanging the servers of the new server platform with another set of > servers with same setup does not help > # Comparing a SHA1 hash of the "corrupt" files with the original files > turned out the files are IDENTICAL. > # Copying the "corrupt" DB to another system, testing it there works as > expected without issues. > # Running an integrity check on the DB shows no problems > # Checking the file permissions on the problematic servers shows no problems > {code:java} > # ls -l /opt/apps/tomcat/i36/webapps/XXXX/database/XX/seg0/c8470.dat > -rw-r--r-- 1 tomcat_i36 tomcat 16384 Aug 21 09:32 > /opt/apps/tomcat/i36/webapps/XXXXX/database/XX/seg0/c8470.dat > > # file /opt/apps/tomcat/i36/webapps/XXXXX/database/XX/seg0/c8470.dat > /opt/apps/tomcat/i36/webapps/XXXXX/database/XX/seg0/c8470.dat: data{code} > # Checking if any linux limits (e.g. open files limit) was reached: nothing > found > # Checking for corrupt file system: Ext3 is used on old and new platform, no > hint about corrupt files found > # Upgrading DERBY from 10.11.1.1 to 10.12.1.1 did not fix the issue. > h2. The server environments > h3. OLD environment (working well) > {code:java} > Linux: SUSE Linux Enterprise Server 11 SP4 (s390x) > Kernel: 3.0.101-91-default #1 SMP Mon Dec 12 13:06:13 UTC 2016 (544b9d1) > s390x s390x s390x GNU/Linux > Filesystem: /dev/mapper/appsvg-lvapps on /opt/apps type ext3 > (rw,acl,user_xattr) > Java: IBM 7.1-4.1 > Tomcat: 7.0.70{code} > h3. NEW environment (not working) > {code:java} > Linux: SUSE Linux Enterprise Server for SAP Applications 12 SP3 (x86_64) > Kernel: 4.4.126-94.22-default #1 SMP Wed Apr 11 07:45:03 UTC 2018 (9649989) > x86_64 x86_64 x86_64 GNU/Linux > Filesystem: /dev/mapper/appsvg-lvapps on /opt/apps type ext3 > (rw,relatime,data=ordered) > Java: IBM 7.1-4.15 > Tomcat: 7.0.85{code} > Our customer has to migrate the server platforms very soon so we would be > very glad if someone could assist us in checking and resolving this. > Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)