Hi Stuart, As I mentioned in my earlier post, runnin filter-media with --force (-f) switch didnt fix the problem.
-Mika 2009/6/16 Stuart Lewis <s.le...@auckland.ac.nz>: > Hi Mika, > > Since running filter-media on new items seems OK, have you tried running: > > [dspace]/bin/filter-media -f > > -f forces all the bitstreams to be re-filtered. > > Thanks, > > > Stuart Lewis > Digital Services Programmer > Te Tumu Herenga The University of Auckland Library > Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand > Ph: 64 9 373-7599 x81928 > http://www.library.auckland.ac.nz/ > > > > -----Original Message----- > From: mikan.d.dspace listmail [mailto:mikan.dsp...@gmail.com] > Sent: Tuesday, 16 June 2009 1:05 a.m. > To: Terrance Davis > Cc: Dspace Tech > Subject: Re: [Dspace-tech] DSpace search weirdness > > Nope. > The server 1 has Debian 5 with Java version "1.6.0_12". and server 2 > has RHEL and Java version "1.5.0_18". Could this cause the problem? > > Another strange thing I noticed, is that if I re-submit the entire > item & file and then run filter-media, the text is extracted > correctly?? So, to me it seems that the old data in the transferred > assetstore is handled incorrectly. Strange, eh? > > -Mika > > > > > 2009/6/15 Terrance Davis <terrance.da...@utah.edu>: >> Hi Mika, >> >> Are both systems using the same OS version and the same version of Java? >> >> Best regards, >> >> Terrance >> >> -- >> Web Applications Programmer >> Institute for Clean and Secure Energy >> University of Utah >> http://www.ices.utah.edu >> >> >> On Jun 15, 2009, at 2:01 AM, mikan.d.dspace listmail wrote: >> >>> Hi Terrance, >>> >>> I double-checked the indexes in configuration and they do match. What >>> I noticed though, is that the text extracted from pdf files differ, >>> which might be the cause of this problem. It seems that when >>> filter-media extracts the text on the other server, it messes up some >>> special characters, thus making them unsearchable. What might be >>> causing this? Both databases are set to UNICODE when created. Is >>> there some other system setting that might be causing this? >>> >>> Example of extracted text is below: >>> >>> Server 1: (correct encoding) >>> 3. PUNAISEN KIRJAN SISÄLTÖ >>> Jaettiin punaisen kirjan sisällön päivitystä varten vastuuhenkilöt >>> seuraavaksi: >>> 3.1 Yleisasu ja kirjan sisällön järjestys miettii ja tarkastelee Tiina >>> Sairanen >>> >>> Server 2: (Messed up characters) >>> >>> 3. PUNAISEN KIRJAN SIS?LT? >>> Jaettiin punaisen kirjan sis?ll?n p?ivityst? varten vastuuhenkil?t >>> seuraavaksi: >>> 3.1 Yleisasu ja kirjan sis?ll?n j?rjestys miettii ja tarkastelee Tiina >>> Sairanen >>> >>> >>> Thanks for any help, >>> Mika >>> >>> >>> 2009/6/12 Terrance Davis <terrance.da...@utah.edu>: >>>> >>>> Hi Mika, >>>> My first guess is that your config files don't match. You might want to >>>> check the server that is returning 40 results. If the configured search >>>> indexes have any white space (such as a tab) after the properties, they >>>> might not be matching up with the dublin core and not indexing properly. >>>> No trim() is happening on the configured search index properties from the >>>> 1.5.2 dspace.cfg, so they may look the same, but be thrown off by extra >>>> unwanted white space. >>>> Best regards, >>>> Terrance Davis >>>> -- >>>> Web Applications Programmer >>>> Institute for Clean and Secure Energy >>>> University of Utah >>>> http://www.ices.utah.edu/ >>>> >>>> >>>> >>>> On Jun 12, 2009, at 5:24 AM, mikan.d.dspace listmail wrote: >>>> >>>> Im confused by the way DSpace search works. I cloned our Dspace 1.5.2 >>>> instance to another server. They both have the same config, same items >>>> etc. However when I run search I get different results?! With the same >>>> search term the other search shows 40 results and the other 72. I've >>>> forced reindexing and media-filters but nothing changes. What could be >>>> the cause of this? >>>> >>>> Thanks, >>>> Mika >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Crystal Reports - New Free Runtime and 30 Day Trial >>>> Check out the new simplified licensing option that enables unlimited >>>> royalty-free distribution of the report engine for externally facing >>>> server and web deployment. >>>> http://p.sf.net/sfu/businessobjects >>>> _______________________________________________ >>>> DSpace-tech mailing list >>>> DSpace-tech@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/dspace-tech >>>> >>>> >> >> > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech > ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech