Hi Erik, Thanks for getting back to me.

Ahh yes, I see what you mean - if I "Lucene-Index" only plain text 
files, Ferret can search that index fine (it seems).

However, what I'm trying to do is index pdfs, using PDFBox to create the 
Lucene documents - but Ferret isn't at all pleased when I try to search:

NoMethodError: You have a nil object when you didn't expect it!
The error occured while evaluating nil.name
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_buffer.rb:31:in 
`read'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:90:in 
`next
?'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:118:in 
`sca
n_to'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:285:in 
`scan_fo
r_term_info'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:163:in 
`get_ter
m_info'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_reader.rb:176:in 
`doc_fr
eq'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in 
`doc_freq
'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in 
`each'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in 
`doc_freq
'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:47:in 
`doc_fr
eq'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:13:in 
`initialize
'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in 
`new'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in 
`create_wei
ght'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:113:in 
`initia
lize'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in 
`each'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in 
`initia
lize'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in 
`new'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in 
`create
_weight'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/query.rb:51:in `weight'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:107:in 
`searc
h'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:660:in 
`do_search'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:331:in 
`search_each'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in 
`synchronize'
    c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in 
`search_each'
    ./lib/ferret_client.rb:34:in `search_index'
    test/functional/ferret_client_test.rb:12:in `test_search_index'

This is a shame, as I thought I was onto a winner with the Lucene/Ferret 
combo - especially with PDFBox able to create Lucene Docs so easily.

This may not actually relate to your point of higher order chars...?

Does anyone have any experience of indexing pdfs in Lucene (using 
PDFBox) and searching with Ferret? Or of course creating Ferret Index 
Docs from pdf files in ruby?

Any ideas or advice gratefully received.
Thanks,
Steven


-- 
Posted via http://www.ruby-forum.com/.
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to