Re: [GNC-dev] Import PDF to GnuCash

2018-07-26 Thread Bob Gustafson

Take a look at Tesseract

https://github.com/tesseract-ocr/tesseract


On 07/26/2018 02:56 PM, deltatango wrote:


Hello,

Very interested in the possibility of importing PDF statements into GnuCash.

I know Quickbooks now has this functionality.

I searched online and found a few clunky possibilities that would convert
the data into excel which can then be converted to csv and then imported
into GnuCash.

I was envisioning a system where you select a PDF statement to be imported.

The program then asks you to select the area of the statement which contains
the transactions, much like a photoshop selection. (And perhaps you could
save templates of selections for different statements).

Then some kind of OCR scanning reads the columns and data and convert it to
columns/rows.

Is this in the realm of possibility for some future release?

It is so common now that exporting csv or qfx ,etc files from your bank only
go so far back and you have to download PDFs instead...

I dream, I hope...

But in vain I wish not...



--
Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-26 Thread John Ralls



> On Jul 26, 2018, at 12:56 PM, deltatango  wrote:
> 
> Hello,
> 
> Very interested in the possibility of importing PDF statements into GnuCash.
> 
> I know Quickbooks now has this functionality.
> 
> I searched online and found a few clunky possibilities that would convert
> the data into excel which can then be converted to csv and then imported
> into GnuCash.
> 
> I was envisioning a system where you select a PDF statement to be imported.
> 
> The program then asks you to select the area of the statement which contains
> the transactions, much like a photoshop selection. (And perhaps you could
> save templates of selections for different statements).
> 
> Then some kind of OCR scanning reads the columns and data and convert it to
> columns/rows.
> 
> Is this in the realm of possibility for some future release?
> 
> It is so common now that exporting csv or qfx ,etc files from your bank only
> go so far back and you have to download PDFs instead...
> 
> I dream, I hope...
> 
> But in vain I wish not...

It's very unlikely with the current very small development team.  We have other 
priorities that we expect to keep us fully engaged far into the future.

Regards,
John Ralls

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Old 'Latest Version' on SourceForge

2018-07-26 Thread Adrien Monteleone
Roger that.

Regards,
Adrien

> On Jul 26, 2018, at 5:22 PM, John Ralls  wrote:
> 
> 
> 
>> On Jul 26, 2018, at 12:09 PM, Adrien Monteleone 
>>  wrote:
>> 
>> I went to SourceForge and noticed that the big green ‘Download Latest 
>> Version’ and ‘Download’ buttons are set to 3.1 and not 3.2. At least this is 
>> for the .dmg. On Windows the file offered is the 3.1 tarball, not the .exe. 
>> (I turned scripts on just in case auto-detection was the issue, but it still 
>> offered the tarball instead of the .exe installer. Could this be a ‘private 
>> browsing’ or ‘cookie’ issue?)
>> 
> 
> Adrien,
> 
> No, it's a setting in the SourceForge file manager. It has a (somewhat 
> flakey) automatic setting that it guesses from the file type and always 
> offers the latest file of an appropriate type unless it's overridden in the 
> file properties. We do that because we don't want the BGB to offer betas when 
> we're in a beta cycle. Sometimes the bit seems to revert on its own and 
> sometimes I fail to reset it when I upload a new release (in spite of it 
> being in the checklist).
> 
> I've just changed the settings for all of the 3.2 files.
> 
> Regards,
> John Ralls
> 
> 


___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Old 'Latest Version' on SourceForge

2018-07-26 Thread John Ralls


> On Jul 26, 2018, at 12:09 PM, Adrien Monteleone 
>  wrote:
> 
> I went to SourceForge and noticed that the big green ‘Download Latest 
> Version’ and ‘Download’ buttons are set to 3.1 and not 3.2. At least this is 
> for the .dmg. On Windows the file offered is the 3.1 tarball, not the .exe. 
> (I turned scripts on just in case auto-detection was the issue, but it still 
> offered the tarball instead of the .exe installer. Could this be a ‘private 
> browsing’ or ‘cookie’ issue?)
> 

Adrien,

No, it's a setting in the SourceForge file manager. It has a (somewhat flakey) 
automatic setting that it guesses from the file type and always offers the 
latest file of an appropriate type unless it's overridden in the file 
properties. We do that because we don't want the BGB to offer betas when we're 
in a beta cycle. Sometimes the bit seems to revert on its own and sometimes I 
fail to reset it when I upload a new release (in spite of it being in the 
checklist).

I've just changed the settings for all of the 3.2 files.

Regards,
John Ralls

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-26 Thread Jim DeLaHunt

Hello, Delta Tango:

I am not one of the Gnucash developers, but I am a software engineer who 
used to work for Adobe Systems (the creator of PDF), and so my ears 
perked up at your question.


On 2018-07-26 12:56, deltatango wrote:

Hello,

Very interested in the possibility of importing PDF statements into GnuCash.

I know Quickbooks now has this functionality.
Fascinating!  Could you perhaps send a link to a Quicken page explaining 
what PDF import functionality Quicken has?  I see this page 
 
which says, "Please be aware that Quicken cannot import … PDF … files."

…I was envisioning a system where you select a PDF statement to be imported.

The program then asks you to select the area of the statement which contains
the transactions, much like a photoshop selection. (And perhaps you could
save templates of selections for different statements).

Then some kind of OCR scanning reads the columns and data and convert it to
columns/rows.

Is this in the realm of possibility for some future release?
…


I am not a Gnucash developer, so I can't speak about what is in the 
realm of possibility for GnuCash.


I can speak about importing PDF files, in general.

PDF is a container; it can contain different kinds of content which 
might look the same to a human reading the PDF file. It might have a 
collection of commands, "use this font, draw this number '€2.500,00' at 
this location on the page". It might have those same commands, with 
annotations saying, "this is a subtotal". Or, it might have a bitmapped 
image which is a picture of a printed page with those numbers.  
Importing those different kinds of content are very different tasks.


It is like the answer to the question, "is it easy to pour out the 
contents of a cup?"  If the cup contains water: very easy. If the cup 
contains paint: easy for 80%, and you have to use a scraper to get out 
the other 20%. If the cup contains dried concrete: very hard.


If a PDF has a special kind of content which is marked up for easy 
extraction, then maybe it would be less effort to make an importer for 
GnuCash. If the PDF does not have markup, but does have commands to draw 
numbers at specific places, then my guess is that any importer would be 
doing the same thing as a tool that converts the PDF file to a CSV file, 
then imports the CSV file into GnuCash.


If the PDF file contains an image, the best you can do is perform OCR, 
then correct the OCR mistakes. Then you have a PDF file with commands to 
draw numbers at specific places, which you handle as in the case above.


It is conceivable to write a machine learning / artificial intelligence 
program to convert PDF files with statements into a data format which is 
practical to import to GnuCash. But the starting requirement for this is 
tens of thousands of PDF files with example statements, and perhaps the 
tens of thousands of data files corresponding to the PDF files, to use 
as references for the learning.


Now, a lot has changed since I worked on PDF. Maybe something new is 
possible that I don't know about. But this is the situation as I see it.


Best regards,

 —Jim DeLaHunt, Vancouver, Canada

--
--Jim DeLaHunt, j...@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
  multilingual websites consultant

  355-1027 Davie St, Vancouver BC V6E 4L2, Canada
 Canada mobile +1-604-376-8953

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


[GNC-dev] Import PDF to GnuCash

2018-07-26 Thread deltatango
Hello,

Very interested in the possibility of importing PDF statements into GnuCash.

I know Quickbooks now has this functionality.

I searched online and found a few clunky possibilities that would convert
the data into excel which can then be converted to csv and then imported
into GnuCash.

I was envisioning a system where you select a PDF statement to be imported.

The program then asks you to select the area of the statement which contains
the transactions, much like a photoshop selection. (And perhaps you could
save templates of selections for different statements).

Then some kind of OCR scanning reads the columns and data and convert it to
columns/rows.

Is this in the realm of possibility for some future release?

It is so common now that exporting csv or qfx ,etc files from your bank only
go so far back and you have to download PDFs instead...

I dream, I hope...

But in vain I wish not...



--
Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


[GNC-dev] Old 'Latest Version' on SourceForge

2018-07-26 Thread Adrien Monteleone
I went to SourceForge and noticed that the big green ‘Download Latest Version’ 
and ‘Download’ buttons are set to 3.1 and not 3.2. At least this is for the 
.dmg. On Windows the file offered is the 3.1 tarball, not the .exe. (I turned 
scripts on just in case auto-detection was the issue, but it still offered the 
tarball instead of the .exe installer. Could this be a ‘private browsing’ or 
‘cookie’ issue?)

Regards,
Adrien
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel