Re: [Dspace-tech] Dspace: IllegalArgumentException
Hi, Delving back through the history, it appears that you are using revision #1312 of Browse.java (not sure what release of DSpace that relates to!) http://dspace.svn.sourceforge.net/viewvc/dspace/branches/dspace-1_4_x/dspace/src/org/dspace/browse/Browse.java?view=log As you will see, the next revision (1342) specifically has a 'fix' for an Oracle compatibility in precisely this location. Try applying the changes as described here: http://dspace.svn.sourceforge.net/viewvc/dspace/branches/dspace-1_4_x/dspace/src/org/dspace/browse/Browse.java?r1=1312&r2=1342 and if that doesn't work, we'll assist you further. G -- Graham Triggs Technical Architect Open Repository Manuel Echeverry wrote: > > Hi > > > > In our university we have 3 instances of dspace > > > > http://dspace.icesi.edu.co/dspace/ > > http://dspace.icesi.edu.co/desarrollo/ > > http://dspace.icesi.edu.co/academico/ > > > > The 3 of them are installed on the same server running linux and > sharing an oracle 10g database. On the past flew days the 3 instances > are experiencing son estrange behavior. If you enter every of them and > try to open say 2 times the same link of the menu (for example try to > open the search link twice) you will get an internal error (the same > happens if you repeat a search query). > > > > Here I share to the list, a sample fragment of one of the dspace > instances Logs. As you will see the errors are always one of this 2: > > > > * java.lang.IllegalArgumentException: Value is not an long > * java.lang.NullPointerException > > > > I appreciate any suggestions of what is going on. > > > > _ > > Manuel Echeverry > > Dirección de servicios y recursos de información > > Soporte a Biblioteca > > > > > > > > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ > > > ___ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech This email has been scanned by Postini. For more information please visit http://www.postini.com - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Same author, two names?
Hello, I have encountered an situation where a author may have different types of writing his name. Sometimes the name is not complete on the publication and other times he only uses the first and the last name. Imagine this case: John Grant Wood John G. Wood John Wood Despite of all these three names belonging to the same person, I cannot insert this as author in Dspace, because search by author would show 3 different persons. So, as author, I will always use it's full name, but users may want to do a search like "John Wood", and this obviously will fail. I would like to add this (let's call it) "author publication name", as a metadata, but I would also like to keep the dublin_core schema. Does dublin_core predicts this situation? I would really like to keep the actual schema in pure dulin_core to avoid future interoperability issues between dspace instances. Thanks Marcelo - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Dspace: IllegalArgumentException
Not really sure what the problem might be. Have you tried to run index-all and restart tomcat? Perhaps more lines from the log file would be useful. -Jose From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Manuel Echeverry Sent: Friday, September 21, 2007 11:57 AM To: dspace-tech@lists.sourceforge.net Subject: [Dspace-tech] Dspace: IllegalArgumentException Hi In our university we have 3 instances of dspace http://dspace.icesi.edu.co/dspace/ http://dspace.icesi.edu.co/desarrollo/ http://dspace.icesi.edu.co/academico/ The 3 of them are installed on the same server running linux and sharing an oracle 10g database. On the past flew days the 3 instances are experiencing son estrange behavior. If you enter every of them and try to open say 2 times the same link of the menu (for example try to open the search link twice) you will get an internal error (the same happens if you repeat a search query). Here I share to the list, a sample fragment of one of the dspace instances Logs. As you will see the errors are always one of this 2: * java.lang.IllegalArgumentException: Value is not an long * java.lang.NullPointerException I appreciate any suggestions of what is going on. _ Manuel Echeverry Dirección de servicios y recursos de información Soporte a Biblioteca - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] OutOfMemory errors during large PDF indexing
I would also then recommend trying to get the latest PDFBox and replace the jar in your lib directory. http://sourceforge.net/project/showfiles.php? group_id=78314&package_id=79377 On Sep 21, 2007, at 9:46 AM, Tim Donohue wrote: > Jayan & Mark, > > Thanks for the suggestions. But, our problem is that we're > currently running Java & dsrun using: > > JAVA_OPTS=-Xmx1024M -Xms1024M -XX:NewRatio=2 -Dfile.encoding=UTF-8 > > (I've modified our local dsrun script to read from the JAVA_OPTS > environment variable). > > So, even setting a maximum heap size of 1GB, we don't seem to be > able to full text index a 15MB PDF without encountering > "OutOfMemory: Java heap space" errors. Strange, I know. My > current theory is that there may be a memory leak in the PDFBox > tools. I'm still working on a definite diagnosis though. If no > one else out there has noticed this with DSpace 1.4.2, then I guess > it's possible there's something in our local settings (or > customizations of DSpace) which could be causing this issue. > > - Tim > > Mark Diggory wrote: >> We should consider adding more sane defaults, most machines that >> DSpace is running on have well over 1Gig of memory available and >> its important to remember this is a maximum heap size and is not >> take unless required. I think setting dsrun and the other >> commandline scripts to be 512m (1/2 * 1Gig) would eliminate most >> outlying cases where PDF docs need to be held in memory. >> -Mark Diggory >> On Sep 21, 2007, at 2:10 AM, Jayan Chirayath Kurian wrote: >>> Hi! Tim, >>> >>> Here we faced similar errors while trying out full-text indexing on >>> DSpace 1.4.1/windows 2003 standard edition. We had roughly 100,000 >>> records. This was rectified once dsrun.bat was given 1000m at java >>> -Xmx256m -classpath >>> http://repositorydev.ntu.edu.sg >>> >>> Jayan >>> >>> >>> -Original Message- >>> From: [EMAIL PROTECTED] >>> [mailto:[EMAIL PROTECTED] On Behalf Of Tim >>> Donohue >>> Sent: Friday, September 21, 2007 1:58 AM >>> To: dspace-tech >>> Subject: [Dspace-tech] OutOfMemory errors during large PDF indexing >>> >>> All, >>> >>> I'm curious if anyone out there has run into strange OutOfMemory >>> errors >>> while full-text indexing larger (>10MB) PDF files in DSpace. >>> >>> It usually appears as either: >>> >>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap >>> space >>> >>> OR >>> >>> Exception in thread "main" java.lang.OutOfMemoryError: GC >>> Overhead limit >>> >>> exceeded >>> >>> I've located the main "problem" PDF in our DSpace instance: >>> http://hdl.handle.net/2142/2050 >>> >>> I've also done a large amount of searching/testing based on >>> recommendations from various sites. In particular, I've done a >>> memory >>> dump using JHat >>> (http://java.sun.com/javase/6/docs/technotes/tools/share/ >>> jhat.html), and >>> >>> it looks like the problem may reside with a potential memory leak >>> in the >>> >>> 3rd party PDFBox tool used by DSpace 1.4.2. (In particular, it >>> *looks* >>> like PDFBox is attempting to load most/all of the textual content >>> into a >>> >>> giant HashMap) >>> >>> Here's the latest settings I've been testing on: >>> >>> RHEL 4 >>> Java 1.6.0_02 >>> Postgres 8.1.9 >>> DSpace 1.4.2 >>> >>> We also have the following JAVA_OPTS settings in place for our JVM: >>> >>> JAVA_OPTS=-Xmx1024M -Xms1024M -XX:NewRatio=2 -Dfile.encoding=UTF-8 >>> >>> (We initially had Xmx and Xms at 512MB, but I bumped it up and we're >>> still getting the OutOfMemory exception at 1GB!) >>> >>> Anyone have any hints/tips or JVM settings to share? I >>> personally don't >>> >>> see why PDFBox would need so much JVM memory to parse a 15MB >>> PDF. But, >>> the JHat analysis seemed to be pointing to PDFBox. >>> >>> - Tim >>> >>> P.S. an example of the full error stack trace is below: >>> >>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap >>> space >>> at java.util.HashMap.resize(Unknown Source) >>> at java.util.HashMap.addEntry(Unknown Source) >>> at java.util.HashMap.put(Unknown Source) >>> at org.fontbox.cmap.CMap.addMapping(CMap.java:132) >>> at org.fontbox.cmap.CMapParser.parse(CMapParser.java:153) >>> at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java: >>> 535) >>> at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387) >>> at >>> org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325) >>> at org.pdfbox.util.operator.ShowText.process >>> (ShowText.java:64) >>> at >>> org.pdfbox.util.PDFStreamEngine.processOperator >>> (PDFStreamEngine.java:452 >>> ) >>> at >>> org.pdfbox.util.PDFStreamEngine.processSubStream >>> (PDFStreamEngine.java:21 >>> 5) >>> at >>> org.pdfbox.util.PDFStreamEngine.processStream >>> (PDFStreamEngine.java:174) >>> at >>> org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripp
Re: [Dspace-tech] ETD-MS management
We are just getting going with ETD's and we've added the thesis.x.x fields as a separate schema called "thesis", along with adding the dc qualifiers that weren't already in the registry. Since it is a different schema I didn't think it would be appropriate to include it in the dc elements - I don't think it's even possible, as you can't have an element and more than one qualifier (at least from my knowledge) in a schema, so dc.thesis.degree.grantor, for example, would be impossible to even enter into the registry. Shane Beers Digital Repository Services Librarian George Mason University [EMAIL PROTECTED] 703-993-3742 On Sep 20, 2007, at 3:24 PM, Dorothea Salo wrote: For those of us using DSpace 1.4+ to store electronic theses and dissertations: If you're using ETD-MS metadata, how are you managing your DSpace metadata registry? Are you placing the non-Dublin-Core ETD-MS metadata (e.g. degree information) inside the Dublin Core schema or in a separate one? If you use a separate one, are you putting the Dublin Core metadata used in ETD-MS inside the new schema, or are you mixing schemas on your ingest and display pages? I'm about to go change my metadata registry to make ETD-MS work as part of the Manakin-based redesign I'm working on, and I'd like to do it in accord with best practices (or at least experience!). Dorothea -- Dorothea Salo[EMAIL PROTECTED] Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 -- --- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] OutOfMemory errors during large PDF indexing
Jayan & Mark, Thanks for the suggestions. But, our problem is that we're currently running Java & dsrun using: JAVA_OPTS=-Xmx1024M -Xms1024M -XX:NewRatio=2 -Dfile.encoding=UTF-8 (I've modified our local dsrun script to read from the JAVA_OPTS environment variable). So, even setting a maximum heap size of 1GB, we don't seem to be able to full text index a 15MB PDF without encountering "OutOfMemory: Java heap space" errors. Strange, I know. My current theory is that there may be a memory leak in the PDFBox tools. I'm still working on a definite diagnosis though. If no one else out there has noticed this with DSpace 1.4.2, then I guess it's possible there's something in our local settings (or customizations of DSpace) which could be causing this issue. - Tim Mark Diggory wrote: > We should consider adding more sane defaults, most machines that DSpace > is running on have well over 1Gig of memory available and its important > to remember this is a maximum heap size and is not take unless required. > I think setting dsrun and the other commandline scripts to be 512m (1/2 > * 1Gig) would eliminate most outlying cases where PDF docs need to be > held in memory. > > -Mark Diggory > > On Sep 21, 2007, at 2:10 AM, Jayan Chirayath Kurian wrote: > >> Hi! Tim, >> >> Here we faced similar errors while trying out full-text indexing on >> DSpace 1.4.1/windows 2003 standard edition. We had roughly 100,000 >> records. This was rectified once dsrun.bat was given 1000m at java >> -Xmx256m -classpath >> http://repositorydev.ntu.edu.sg >> >> Jayan >> >> >> -Original Message- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Tim >> Donohue >> Sent: Friday, September 21, 2007 1:58 AM >> To: dspace-tech >> Subject: [Dspace-tech] OutOfMemory errors during large PDF indexing >> >> All, >> >> I'm curious if anyone out there has run into strange OutOfMemory errors >> while full-text indexing larger (>10MB) PDF files in DSpace. >> >> It usually appears as either: >> >> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >> >> OR >> >> Exception in thread "main" java.lang.OutOfMemoryError: GC Overhead limit >> >> exceeded >> >> I've located the main "problem" PDF in our DSpace instance: >> http://hdl.handle.net/2142/2050 >> >> I've also done a large amount of searching/testing based on >> recommendations from various sites. In particular, I've done a memory >> dump using JHat >> (http://java.sun.com/javase/6/docs/technotes/tools/share/jhat.html), and >> >> it looks like the problem may reside with a potential memory leak in the >> >> 3rd party PDFBox tool used by DSpace 1.4.2. (In particular, it *looks* >> like PDFBox is attempting to load most/all of the textual content into a >> >> giant HashMap) >> >> Here's the latest settings I've been testing on: >> >> RHEL 4 >> Java 1.6.0_02 >> Postgres 8.1.9 >> DSpace 1.4.2 >> >> We also have the following JAVA_OPTS settings in place for our JVM: >> >> JAVA_OPTS=-Xmx1024M -Xms1024M -XX:NewRatio=2 -Dfile.encoding=UTF-8 >> >> (We initially had Xmx and Xms at 512MB, but I bumped it up and we're >> still getting the OutOfMemory exception at 1GB!) >> >> Anyone have any hints/tips or JVM settings to share? I personally don't >> >> see why PDFBox would need so much JVM memory to parse a 15MB PDF. But, >> the JHat analysis seemed to be pointing to PDFBox. >> >> - Tim >> >> P.S. an example of the full error stack trace is below: >> >> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >> at java.util.HashMap.resize(Unknown Source) >> at java.util.HashMap.addEntry(Unknown Source) >> at java.util.HashMap.put(Unknown Source) >> at org.fontbox.cmap.CMap.addMapping(CMap.java:132) >> at org.fontbox.cmap.CMapParser.parse(CMapParser.java:153) >> at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:535) >> at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387) >> at >> org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325) >> at org.pdfbox.util.operator.ShowText.process(ShowText.java:64) >> at >> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452 >> ) >> at >> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:21 >> 5) >> at >> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) >> at >> org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) >> at >> org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) >> at >> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) >> at >> org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149) >> at >> org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java >> :114) >> at >> org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilt >> erManager.
Re: [Dspace-tech] OutOfMemory errors during large PDF indexing
We should consider adding more sane defaults, most machines that DSpace is running on have well over 1Gig of memory available and its important to remember this is a maximum heap size and is not take unless required. I think setting dsrun and the other commandline scripts to be 512m (1/2 * 1Gig) would eliminate most outlying cases where PDF docs need to be held in memory. -Mark Diggory On Sep 21, 2007, at 2:10 AM, Jayan Chirayath Kurian wrote: > Hi! Tim, > > Here we faced similar errors while trying out full-text indexing on > DSpace 1.4.1/windows 2003 standard edition. We had roughly 100,000 > records. This was rectified once dsrun.bat was given 1000m at java > -Xmx256m -classpath > http://repositorydev.ntu.edu.sg > > Jayan > > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Tim > Donohue > Sent: Friday, September 21, 2007 1:58 AM > To: dspace-tech > Subject: [Dspace-tech] OutOfMemory errors during large PDF indexing > > All, > > I'm curious if anyone out there has run into strange OutOfMemory > errors > while full-text indexing larger (>10MB) PDF files in DSpace. > > It usually appears as either: > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > > OR > > Exception in thread "main" java.lang.OutOfMemoryError: GC Overhead > limit > > exceeded > > I've located the main "problem" PDF in our DSpace instance: > http://hdl.handle.net/2142/2050 > > I've also done a large amount of searching/testing based on > recommendations from various sites. In particular, I've done a > memory > dump using JHat > (http://java.sun.com/javase/6/docs/technotes/tools/share/ > jhat.html), and > > it looks like the problem may reside with a potential memory leak > in the > > 3rd party PDFBox tool used by DSpace 1.4.2. (In particular, it > *looks* > like PDFBox is attempting to load most/all of the textual content > into a > > giant HashMap) > > Here's the latest settings I've been testing on: > > RHEL 4 > Java 1.6.0_02 > Postgres 8.1.9 > DSpace 1.4.2 > > We also have the following JAVA_OPTS settings in place for our JVM: > > JAVA_OPTS=-Xmx1024M -Xms1024M -XX:NewRatio=2 -Dfile.encoding=UTF-8 > > (We initially had Xmx and Xms at 512MB, but I bumped it up and we're > still getting the OutOfMemory exception at 1GB!) > > Anyone have any hints/tips or JVM settings to share? I personally > don't > > see why PDFBox would need so much JVM memory to parse a 15MB PDF. > But, > the JHat analysis seemed to be pointing to PDFBox. > > - Tim > > P.S. an example of the full error stack trace is below: > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at java.util.HashMap.resize(Unknown Source) > at java.util.HashMap.addEntry(Unknown Source) > at java.util.HashMap.put(Unknown Source) > at org.fontbox.cmap.CMap.addMapping(CMap.java:132) > at org.fontbox.cmap.CMapParser.parse(CMapParser.java:153) > at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:535) > at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387) > at > org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325) > at org.pdfbox.util.operator.ShowText.process(ShowText.java: > 64) > at > org.pdfbox.util.PDFStreamEngine.processOperator > (PDFStreamEngine.java:452 > ) > at > org.pdfbox.util.PDFStreamEngine.processSubStream > (PDFStreamEngine.java:21 > 5) > at > org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java: > 174) > at > org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) > at > org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) > at > org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) > at > org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149) > at > org.dspace.app.mediafilter.PDFFilter.getDestinationStream > (PDFFilter.java > :114) > at > org.dspace.app.mediafilter.MediaFilterManager.processBitstream > (MediaFilt > erManager.java:602) > at > org.dspace.app.mediafilter.MediaFilterManager.filterBitstream > (MediaFilte > rManager.java:513) > at > org.dspace.app.mediafilter.MediaFilterManager.filterItem > (MediaFilterMana > ger.java:461) > at > org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem > (MediaFilt > erManager.java:428) > at > org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems > (Media > FilterManager.java:391) > at > org.dspace.app.mediafilter.MediaFilterManager.main > (MediaFilterManager.ja > va:342) > > -- > -- > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ > ___ >