Hi Bill,

You have got it right. I cloned the new DSpace instance roughly by:

1. first doing a fresh install of DSpace source
2. Importing database dump from the other server (taken with pg_dump,
I also tried pg_restore btw.)
3. I created assetstore.tar.gz from my old server and copied it to the
new server.

When I run media-filter or media-filter --force, the extracted text
doesnt get the special characters (say ä, ö, å) right, but has '?'
-mark instead of them. On my original server everything works fine.
And on my new server, the new submissions work fine after
filter-media.

I just re-ran filter-media -f and no error messages come up. Maybe I
should dig the assetstore to see what the files look like from the
command line? How could I find out the assetstore path for a specific
item?

Thanks,
Mika


2009/6/16  <bill.ander...@library.gatech.edu>:
> Correct me if I don't have this right:  you had an existing instance of 
> dspace, where search worked properly.  You cloned the instance to new server, 
> and after the transfer, media filter wasn't able to extract full text 
> properly from PDFs with special characters in them.  When you re-submit the 
> PDFs to the new instance, media filter (and thus search) works as it should?
>
> It's possible the pdfs were damaged in the transfer.  How did you transfer 
> them?
>
> I assume you're not seeing any errors in the media filter output, right?
>
> Cheers,
>
> Bill
>
> Bill Anderson
> Software Developer
> Digital Library Development
> Georgia Tech Library
>
> ----- "mikan.d.dspace listmail" <mikan.dsp...@gmail.com> wrote:
>
> | Hi Stuart,
> | As I mentioned in my earlier post, runnin filter-media with --force
> | (-f) switch didnt fix the problem.
> |
> | -Mika
> |
> | 2009/6/16 Stuart Lewis <s.le...@auckland.ac.nz>:
> | > Hi Mika,
> | >
> | > Since running filter-media on new items seems OK, have you tried
> | running:
> | >
> | > [dspace]/bin/filter-media -f
> | >
> | > -f forces all the bitstreams to be re-filtered.
> | >
> | > Thanks,
> | >
> | >
> | > Stuart Lewis
> | > Digital Services Programmer
> | > Te Tumu Herenga The University of Auckland Library
> | > Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
> | > Ph: 64 9 373-7599 x81928
> | > http://www.library.auckland.ac.nz/
> | >
> | >
> | >
> | > -----Original Message-----
> | > From: mikan.d.dspace listmail [mailto:mikan.dsp...@gmail.com]
> | > Sent: Tuesday, 16 June 2009 1:05 a.m.
> | > To: Terrance Davis
> | > Cc: Dspace Tech
> | > Subject: Re: [Dspace-tech] DSpace search weirdness
> | >
> | > Nope.
> | > The server 1 has Debian 5 with Java  version "1.6.0_12". and server
> | 2
> | > has RHEL and Java version  "1.5.0_18". Could this cause the
> | problem?
> | >
> | > Another strange thing I noticed, is that if I re-submit the entire
> | > item & file and then run filter-media, the text is extracted
> | > correctly?? So, to me  it seems that the old data in the
> | transferred
> | > assetstore is handled incorrectly. Strange, eh?
> | >
> | > -Mika
> | >
> | >
> | >
> | >
> | > 2009/6/15 Terrance Davis <terrance.da...@utah.edu>:
> | >> Hi Mika,
> | >>
> | >> Are both systems using the same OS version and the same version of
> | Java?
> | >>
> | >> Best regards,
> | >>
> | >> Terrance
> | >>
> | >> --
> | >> Web Applications Programmer
> | >> Institute for Clean and Secure Energy
> | >> University of Utah
> | >> http://www.ices.utah.edu
> | >>
> | >>
> | >> On Jun 15, 2009, at 2:01 AM, mikan.d.dspace listmail wrote:
> | >>
> | >>> Hi Terrance,
> | >>>
> | >>> I double-checked the indexes in configuration and they do match.
> | What
> | >>> I noticed though, is that the text extracted from pdf files
> | differ,
> | >>> which might be the cause of this problem. It seems that when
> | >>> filter-media extracts the text on the other server, it messes up
> | some
> | >>> special characters, thus making them unsearchable. What might be
> | >>> causing  this? Both databases are set to UNICODE when created. Is
> | >>> there some other system setting that might be causing this?
> | >>>
> | >>> Example of extracted text is below:
> | >>>
> | >>> Server 1: (correct encoding)
> | >>> 3. PUNAISEN KIRJAN SISÄLTÖ
> | >>> Jaettiin punaisen kirjan sisällön päivitystä varten
> | vastuuhenkilöt
> | >>> seuraavaksi:
> | >>> 3.1 Yleisasu ja kirjan sisällön järjestys miettii ja tarkastelee
> | Tiina
> | >>> Sairanen
> | >>>
> | >>> Server 2: (Messed up characters)
> | >>>
> | >>> 3. PUNAISEN KIRJAN SIS?LT?
> | >>> Jaettiin punaisen kirjan sis?ll?n p?ivityst? varten
> | vastuuhenkil?t
> | >>> seuraavaksi:
> | >>> 3.1 Yleisasu ja kirjan sis?ll?n j?rjestys miettii ja tarkastelee
> | Tiina
> | >>> Sairanen
> | >>>
> | >>>
> | >>> Thanks for any help,
> | >>> Mika
> | >>>
> | >>>
> | >>> 2009/6/12 Terrance Davis <terrance.da...@utah.edu>:
> | >>>>
> | >>>> Hi Mika,
> | >>>> My first guess is that your config files don't match. You might
> | want to
> | >>>> check the server that is returning 40 results. If the configured
> | search
> | >>>> indexes have any white space (such as a tab) after the
> | properties, they
> | >>>> might not be matching up with the dublin core and not indexing
> | properly.
> | >>>> No trim() is happening on the configured search index properties
> | from the
> | >>>> 1.5.2 dspace.cfg, so they may look the same, but be thrown off by
> | extra
> | >>>> unwanted white space.
> | >>>> Best regards,
> | >>>> Terrance Davis
> | >>>> --
> | >>>> Web Applications Programmer
> | >>>> Institute for Clean and Secure Energy
> | >>>> University of Utah
> | >>>> http://www.ices.utah.edu/
> | >>>>
> | >>>>
> | >>>>
> | >>>> On Jun 12, 2009, at 5:24 AM, mikan.d.dspace listmail wrote:
> | >>>>
> | >>>> Im confused by the way DSpace search works. I cloned our Dspace
> | 1.5.2
> | >>>> instance to another server. They both have the same config, same
> | items
> | >>>> etc. However when I run search I get different results?! With the
> | same
> | >>>> search term the other search shows 40 results and the other 72.
> | I've
> | >>>> forced reindexing and media-filters but nothing changes. What
> | could be
> | >>>> the  cause of this?
> | >>>>
> | >>>> Thanks,
> | >>>> Mika
> | >>>>
> | >>>>
> | >>>>
> | 
> ------------------------------------------------------------------------------
> | >>>> Crystal Reports - New Free Runtime and 30 Day Trial
> | >>>> Check out the new simplified licensing option that enables
> | unlimited
> | >>>> royalty-free distribution of the report engine for externally
> | facing
> | >>>> server and web deployment.
> | >>>> http://p.sf.net/sfu/businessobjects
> | >>>> _______________________________________________
> | >>>> DSpace-tech mailing list
> | >>>> DSpace-tech@lists.sourceforge.net
> | >>>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> | >>>>
> | >>>>
> | >>
> | >>
> | >
> | >
> | 
> ------------------------------------------------------------------------------
> | > Crystal Reports - New Free Runtime and 30 Day Trial
> | > Check out the new simplified licensing option that enables
> | unlimited
> | > royalty-free distribution of the report engine for externally
> | facing
> | > server and web deployment.
> | > http://p.sf.net/sfu/businessobjects
> | > _______________________________________________
> | > DSpace-tech mailing list
> | > DSpace-tech@lists.sourceforge.net
> | > https://lists.sourceforge.net/lists/listinfo/dspace-tech
> | >
> |
> | 
> ------------------------------------------------------------------------------
> | Crystal Reports - New Free Runtime and 30 Day Trial
> | Check out the new simplified licensing option that enables unlimited
> | royalty-free distribution of the report engine for externally facing
> | server and web deployment.
> | http://p.sf.net/sfu/businessobjects
> | _______________________________________________
> | DSpace-tech mailing list
> | DSpace-tech@lists.sourceforge.net
> | https://lists.sourceforge.net/lists/listinfo/dspace-tech
>

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to