Nope. The server 1 has Debian 5 with Java version "1.6.0_12". and server 2 has RHEL and Java version "1.5.0_18". Could this cause the problem?
Another strange thing I noticed, is that if I re-submit the entire item & file and then run filter-media, the text is extracted correctly?? So, to me it seems that the old data in the transferred assetstore is handled incorrectly. Strange, eh? -Mika 2009/6/15 Terrance Davis <terrance.da...@utah.edu>: > Hi Mika, > > Are both systems using the same OS version and the same version of Java? > > Best regards, > > Terrance > > -- > Web Applications Programmer > Institute for Clean and Secure Energy > University of Utah > http://www.ices.utah.edu > > > On Jun 15, 2009, at 2:01 AM, mikan.d.dspace listmail wrote: > >> Hi Terrance, >> >> I double-checked the indexes in configuration and they do match. What >> I noticed though, is that the text extracted from pdf files differ, >> which might be the cause of this problem. It seems that when >> filter-media extracts the text on the other server, it messes up some >> special characters, thus making them unsearchable. What might be >> causing this? Both databases are set to UNICODE when created. Is >> there some other system setting that might be causing this? >> >> Example of extracted text is below: >> >> Server 1: (correct encoding) >> 3. PUNAISEN KIRJAN SISÄLTÖ >> Jaettiin punaisen kirjan sisällön päivitystä varten vastuuhenkilöt >> seuraavaksi: >> 3.1 Yleisasu ja kirjan sisällön järjestys miettii ja tarkastelee Tiina >> Sairanen >> >> Server 2: (Messed up characters) >> >> 3. PUNAISEN KIRJAN SIS?LT? >> Jaettiin punaisen kirjan sis?ll?n p?ivityst? varten vastuuhenkil?t >> seuraavaksi: >> 3.1 Yleisasu ja kirjan sis?ll?n j?rjestys miettii ja tarkastelee Tiina >> Sairanen >> >> >> Thanks for any help, >> Mika >> >> >> 2009/6/12 Terrance Davis <terrance.da...@utah.edu>: >>> >>> Hi Mika, >>> My first guess is that your config files don't match. You might want to >>> check the server that is returning 40 results. If the configured search >>> indexes have any white space (such as a tab) after the properties, they >>> might not be matching up with the dublin core and not indexing properly. >>> No trim() is happening on the configured search index properties from the >>> 1.5.2 dspace.cfg, so they may look the same, but be thrown off by extra >>> unwanted white space. >>> Best regards, >>> Terrance Davis >>> -- >>> Web Applications Programmer >>> Institute for Clean and Secure Energy >>> University of Utah >>> http://www.ices.utah.edu/ >>> >>> >>> >>> On Jun 12, 2009, at 5:24 AM, mikan.d.dspace listmail wrote: >>> >>> Im confused by the way DSpace search works. I cloned our Dspace 1.5.2 >>> instance to another server. They both have the same config, same items >>> etc. However when I run search I get different results?! With the same >>> search term the other search shows 40 results and the other 72. I've >>> forced reindexing and media-filters but nothing changes. What could be >>> the cause of this? >>> >>> Thanks, >>> Mika >>> >>> >>> ------------------------------------------------------------------------------ >>> Crystal Reports - New Free Runtime and 30 Day Trial >>> Check out the new simplified licensing option that enables unlimited >>> royalty-free distribution of the report engine for externally facing >>> server and web deployment. >>> http://p.sf.net/sfu/businessobjects >>> _______________________________________________ >>> DSpace-tech mailing list >>> DSpace-tech@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/dspace-tech >>> >>> > > ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech