Re: [firebird-support] Text search ...

2019-03-12 Thread Andrew Lowe a...@wht.com.au [firebird-support]
On 13/03/19 06:44, Lester Caine les...@lsces.co.uk [firebird-support] wrote:
> I've got a few of sites where I've got a growing number of pdf files 
> which it would be nice to actually index the content. First problem is 
> obviously the different qualities of pdf, and I've had finereader 
> deployed in some cases to provide OCRed copies of the original, with the 
> usual variable success. The question is just what is the best base to be 
> working towards. I'm currently working on the basis that we store the 
> original file, and I create thumbnails of the front page so I'm now 
> looking to striping the raw text. Anybody been there already? Any 
> suggestions for Linux based solutions ...
> 
> The current indexing process is pulling a list of words from the 
> document and building a manual index. It was first working pre-Firebird 
> and has not changed so is there a better was with FB3?
> 

Maybe you might want to have a look at Zotero. It does a lot of stuff
with pdf's, databases etc.

Andrew


Re: [firebird-support] Text search ...

2019-03-13 Thread Lester Caine les...@lsces.co.uk [firebird-support]
On 13/03/2019 02:35, Andrew Lowe a...@wht.com.au [firebird-support] wrote:
> Maybe you might want to have a look at Zotero. It does a lot of stuff
> with pdf's, databases etc.

That is working on a different part of the problem although it might 
have provided pointers except for one thing ... "But if you value a 
built-in PDF reader, Zotero may not be the best tool for you." Basically 
it 'suffers' from the same problem I'm trying to address. We have the 
pdf's and in the case of some sites we can even access the original 
source used to create them. Google and Bing crawl the pdf's but using 
them to provide search results is not acceptable for my local council 
clients so we need a 'local' search facility.

BUGGER that was working fine yesterday !!!
https://northwaypc.org.uk/fisheye/gallery/75 is an example, but just 
found the dreaded 'white screen' on looking at a volume ... and the 
minutes and agendas galleries are not working either ... This site was 
fine yesterday and 'I've' not changed anything in my sleep :( Now to 
debug that ...

-- 
Lester Caine - G8HFL
-
Contact - https://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - https://lsces.co.uk
EnquirySolve - https://enquirysolve.com/
Model Engineers Digital Workshop - https://medw.co.uk
Rainbow Digital Media - https://rainbowdigitalmedia.co.uk


Re: [firebird-support] Text search ...

2019-03-13 Thread Steve Wiser st...@specializedbusinesssoftware.com [firebird-support]
We have used tesseract-ocr and lucene (plus firebird for the index data)
for this in the past.  It works pretty well, but not completely out of the
box -- you will have to tweak the image a bit.

-steve


On Wed, Mar 13, 2019 at 4:33 AM Lester Caine les...@lsces.co.uk
[firebird-support]  wrote:

>
>
> On 13/03/2019 02:35, Andrew Lowe a...@wht.com.au [firebird-support] wrote:
> > Maybe you might want to have a look at Zotero. It does a lot of stuff
> > with pdf's, databases etc.
>
> That is working on a different part of the problem although it might
> have provided pointers except for one thing ... "But if you value a
> built-in PDF reader, Zotero may not be the best tool for you." Basically
> it 'suffers' from the same problem I'm trying to address. We have the
> pdf's and in the case of some sites we can even access the original
> source used to create them. Google and Bing crawl the pdf's but using
> them to provide search results is not acceptable for my local council
> clients so we need a 'local' search facility.
>
> BUGGER that was working fine yesterday !!!
> https://northwaypc.org.uk/fisheye/gallery/75 is an example, but just
> found the dreaded 'white screen' on looking at a volume ... and the
> minutes and agendas galleries are not working either ... This site was
> fine yesterday and 'I've' not changed anything in my sleep :( Now to
> debug that ...
>
> --
> Lester Caine - G8HFL
> -
> Contact - https://lsces.co.uk/wiki/?page=contact
> L.S.Caine Electronic Services - https://lsces.co.uk
> EnquirySolve - https://enquirysolve.com/
> Model Engineers Digital Workshop - https://medw.co.uk
> Rainbow Digital Media - https://rainbowdigitalmedia.co.uk
> 
>


Re: [firebird-support] Text search ...

2019-03-14 Thread Lester Caine les...@lsces.co.uk [firebird-support]
On 13/03/2019 14:01, Steve Wiser st...@specializedbusinesssoftware.com 
[firebird-support] wrote:
> We have used tesseract-ocr 
It can be quite amazing what one misses even searching for something 
specific. This has thrown up a couple of add-ons that look interesting 
as well. Just have to fight SUSE to get them installed ... just a pity 
once again it's google who are funding things ... now to see if I can 
get ocrfeeder package for suse!

> lucene (plus firebird for the index data) 
> for this in the past.  It works pretty well, but not completely out of 
> the box -- you will have to tweak the image a bit.
I've looked at lucene in the past and did have a trial configuration set 
up at one time, but support seems to be rather out of date? And .net 
seems to get mixed in ...
I'm a little loathed to add Java into the mix anyway, but all the 
popular full text options seem to be Java based :(

https://firebirdsql.org/en/sphinx-full-text-search/ seems to be a dead 
end for Firebird if what I'm reading on V3 is correct re MySQL native 
access.

http://www.firebirdfaq.org/faq328/ links are somewhat tired and out of 
date. I have used IB_FTS in the past, but not being windows based these 
days I'm only using IBObjects on a dwindling number of legacy systems 
and even those will need to be moved to web based sooner rather than 
later. I need something that plays nicely with PHP7 ...

-- 
Lester Caine - G8HFL
-
Contact - https://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - https://lsces.co.uk
EnquirySolve - https://enquirysolve.com/
Model Engineers Digital Workshop - https://medw.co.uk
Rainbow Digital Media - https://rainbowdigitalmedia.co.uk


Re: [firebird-support] Text search ...

2019-03-14 Thread Steve Wiser st...@specializedbusinesssoftware.com [firebird-support]
Sorry, to be more clear we used lucene outside of firebird.  We also
happened to store specific information about the files in firebird.  I
stayed away from the lucene within firebird approach because the libraries
seemed out of date and I wasn't sure on support.

hope that helps

-steve


On Thu, Mar 14, 2019 at 7:26 AM Lester Caine les...@lsces.co.uk
[firebird-support]  wrote:

>
>
> On 13/03/2019 14:01, Steve Wiser st...@specializedbusinesssoftware..com
> [firebird-support] wrote:
> > We have used tesseract-ocr
> It can be quite amazing what one misses even searching for something
> specific. This has thrown up a couple of add-ons that look interesting
> as well. Just have to fight SUSE to get them installed ... just a pity
> once again it's google who are funding things ... now to see if I can
> get ocrfeeder package for suse!
>
> > lucene (plus firebird for the index data)
> > for this in the past.  It works pretty well, but not completely out of
> > the box -- you will have to tweak the image a bit.
> I've looked at lucene in the past and did have a trial configuration set
> up at one time, but support seems to be rather out of date? And .net
> seems to get mixed in ...
> I'm a little loathed to add Java into the mix anyway, but all the
> popular full text options seem to be Java based :(
>
> https://firebirdsql.org/en/sphinx-full-text-search/ seems to be a dead
> end for Firebird if what I'm reading on V3 is correct re MySQL native
> access.
>
> http://www.firebirdfaq.org/faq328/ links are somewhat tired and out of
> date. I have used IB_FTS in the past, but not being windows based these
> days I'm only using IBObjects on a dwindling number of legacy systems
> and even those will need to be moved to web based sooner rather than
> later. I need something that plays nicely with PHP7 ...
>
> --
> Lester Caine - G8HFL
> -
> Contact - https://lsces.co.uk/wiki/?page=contact
> L.S.Caine Electronic Services - https://lsces.co.uk
> EnquirySolve - https://enquirysolve.com/
> Model Engineers Digital Workshop - https://medw.co.uk
> Rainbow Digital Media - https://rainbowdigitalmedia.co.uk
> 
>


Re: [firebird-support] Text search ...

2019-03-14 Thread Lester Caine les...@lsces.co.uk [firebird-support]
On 14/03/2019 13:20, Steve Wiser st...@specializedbusinesssoftware.com 
[firebird-support] wrote:
> Sorry, to be more clear we used lucene outside of firebird.  We also 
> happened to store specific information about the files in firebird.  I 
> stayed away from the lucene within firebird approach because the 
> libraries seemed out of date and I wasn't sure on support.

Having to live with the MySQL on the wordpress sites I don't want to add 
to that mess with more large databases that I can't back up easily. So 
the key here is to move things to a Firebird base where the nightly 
backup just happens without any disruption to the websites ;)

Does not leave many options ...

-- 
Lester Caine - G8HFL
-
Contact - https://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - https://lsces.co.uk
EnquirySolve - https://enquirysolve.com/
Model Engineers Digital Workshop - https://medw.co.uk
Rainbow Digital Media - https://rainbowdigitalmedia.co.uk