Re: [GNC-dev] Import PDF to GnuCash

2018-08-06 Thread Adrien Monteleone
A company I work with just started using QB Pro 2018, so I’ll check on this 
feature, but a web search turned up this forum topic: 
https://quickbooks.intuit.com/community/Do-more-with-QuickBooks/Pdf-Conversion-to-QBO/td-p/145466

which seems to indicate that it’s nothing new. You need 3rd party software to 
convert an *electronically generated* pdf to QBO format which QuickBooks can 
then upload. Scans of paper are not recommended due to the OCR issues, so OCR 
isn’t their method. They seem to be ’scanning’ the text of the file, but they 
specify ALL of the text has to be selectable and they recommend only PDF 
statements generated by the bank. So most likely, these are programmatically 
generated plain text files that have styling and formatting applied and shipped 
in a PDF container. The rather expensive software, reverses the process back to 
plain text and then interprets what the transactions are. I guess it’s possible 
the banks are doing EDI and simply offering customers a ‘pretty printable 
version’ in PDF format but with the EDI fields embedded so the file could still 
be used with EDI and this special software is just taking advantage of that to 
generate an importable format.

Other solutions mentioned for various incarnations of Intuit software is to 
skip the QBO step and go to CSV, which puts us in GnuCash territory. But I’d 
bet dimes to dollars, you don’t need $100+ software to accomplish that task if 
OCR isn’t part of the workflow.

Regards,
Adrien

> On Aug 6, 2018, at 4:47 AM, c.holterm...@gmx.de wrote:
> 
> Am 2018-07-26 21:56, schrieb deltatango:
>> Hello,
>> Very interested in the possibility of importing PDF statements into GnuCash.
>> I know Quickbooks now has this functionality.
>> I searched online and found a few clunky possibilities that would convert
>> the data into excel which can then be converted to csv and then imported
>> into GnuCash.
>> I was envisioning a system where you select a PDF statement to be imported.
>> The program then asks you to select the area of the statement which contains
>> the transactions, much like a photoshop selection. (And perhaps you could
>> save templates of selections for different statements).
>> Then some kind of OCR scanning reads the columns and data and convert it to
>> columns/rows.
>> Is this in the realm of possibility for some future release?
>> It is so common now that exporting csv or qfx ,etc files from your bank only
>> go so far back and you have to download PDFs instead...
>> I dream, I hope...
>> But in vain I wish not...
>> --
>> Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
>> ___
>> gnucash-devel mailing list
>> gnucash-devel@gnucash.org
>> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
> 
> Hello !
> 
> I haven't heard of PDF statements before. Is it some sort of embedded data ?
> Quick googling led me to https://pdftables.com/blog/convert-bank-statement.
> It seems to be some sort of standard embedding mechanism. Am I right here ?
> 
> Anyway the way would be to extract the data from the PDF. That would either be
> through this statement data or through OCR. The data could be converted to CSV
> and be imported to gnucash.
> 
> The missing link seems to be extraction of data from PDFs.
> 
> Is there a FOSS tool to extract statement data and convert it to CSV ?
> Or when we go OCR. Is there a tool capable of extracting tables ?
> 
> With OCR you usually only get a text file. It does not recognize that it is
> structured table data. At least with the software I used some years ago that
> had been the case.
> 
> The OCR way would be interesting if there are possible right issues as John
> has pointed out about statement data or reverse engineering.
> 
> regards,
> 
> Christoph
> ___
> gnucash-devel mailing list
> gnucash-devel@gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
> 


___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-08-06 Thread c . holtermann

Am 2018-07-26 21:56, schrieb deltatango:

Hello,

Very interested in the possibility of importing PDF statements into 
GnuCash.


I know Quickbooks now has this functionality.

I searched online and found a few clunky possibilities that would 
convert
the data into excel which can then be converted to csv and then 
imported

into GnuCash.

I was envisioning a system where you select a PDF statement to be 
imported.


The program then asks you to select the area of the statement which 
contains
the transactions, much like a photoshop selection. (And perhaps you 
could

save templates of selections for different statements).

Then some kind of OCR scanning reads the columns and data and convert 
it to

columns/rows.

Is this in the realm of possibility for some future release?

It is so common now that exporting csv or qfx ,etc files from your bank 
only

go so far back and you have to download PDFs instead...

I dream, I hope...

But in vain I wish not...



--
Sent from: 
http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Hello !

I haven't heard of PDF statements before. Is it some sort of embedded 
data ?
Quick googling led me to 
https://pdftables.com/blog/convert-bank-statement.
It seems to be some sort of standard embedding mechanism. Am I right 
here ?


Anyway the way would be to extract the data from the PDF. That would 
either be
through this statement data or through OCR. The data could be converted 
to CSV

and be imported to gnucash.

The missing link seems to be extraction of data from PDFs.

Is there a FOSS tool to extract statement data and convert it to CSV ?
Or when we go OCR. Is there a tool capable of extracting tables ?

With OCR you usually only get a text file. It does not recognize that it 
is
structured table data. At least with the software I used some years ago 
that

had been the case.

The OCR way would be interesting if there are possible right issues as 
John

has pointed out about statement data or reverse engineering.

regards,

Christoph
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-31 Thread John Ralls



> On Jul 31, 2018, at 11:52 AM, Tommy Trussell  wrote:
> 
> On Sat, Jul 28, 2018 at 1:08 AM jeffrey black 
> wrote:
> 
>> As near as I understand it, Quickbooks imports a specially formatted pdf
>> file of a statement for reconciliation.  I am sure there is a large
>> amount of money flowing between Quickbooks and Adobe for this right.
>> Adobe has gone to great lengths to make their files viewable and printer
>> printable only, unless you pay fees for features that used to be built
>> in, like export to M$document files (which I used to use extensively for
>> university extension publications).
>> 
> 
> At some point Adobe declared PDF to be an "open" format, so in many cases
> you can peek inside and do something interesting with the files. PDFs are
> "container" files and can contain more than one representation of a
> document at a time.
> 
> The kind of PDFs that get generated by desktop applications and such
> generally contain an abbreviated version of the PostScript page declaration
> language. A PDF generated by a scanner application normally contains a
> compressed TIFF image because that's directly compatible with fax software.
> (And even if it isn't, ImageMagick can generally convert to whatever image
> format you need.)
> 
> Some widely available applications, such as LibreOffice, can generate a PDF
> with multiple items in the container at once. LibreOffice calls theirs a
> "Hybrid PDF," and those PDF files contain the PostScript image AND the
> document's editable source.
> 
> All this to say... If you acquired of one of the "specially formatted" PDF
> documents intended for Quickbooks, I wonder what other document type might
> they have they embedded into the file? For Quickbook's purposes it would
> likely be a Quickbooks or OFX file because parsing the PostScript or
> another image file format might be too unreliable. Of course they might do
> something uncharitable like encrypt it or even compress it in an unusual
> fashion to make reverse-engineering it a hurdle.
> 
> Most linux distributions include several useful PDF parsing and
> manipulation utilities, so conceivably extracting useful data might be
> relatively straightforward with a bit of command-line tinkering.

Careful. Intuit very likely has a "no reverse engineering" clause in their EULA 
and prying into their "special" PDF format in order to enable a competing 
product, even (or maybe especially) a FLOSS one, is likely to get one some 
attention from their lawyers.

Regards,
John Ralls


___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-31 Thread Tommy Trussell
On Sat, Jul 28, 2018 at 1:08 AM jeffrey black 
wrote:

> As near as I understand it, Quickbooks imports a specially formatted pdf
> file of a statement for reconciliation.  I am sure there is a large
> amount of money flowing between Quickbooks and Adobe for this right.
> Adobe has gone to great lengths to make their files viewable and printer
> printable only, unless you pay fees for features that used to be built
> in, like export to M$document files (which I used to use extensively for
> university extension publications).
>

At some point Adobe declared PDF to be an "open" format, so in many cases
you can peek inside and do something interesting with the files. PDFs are
"container" files and can contain more than one representation of a
document at a time.

The kind of PDFs that get generated by desktop applications and such
generally contain an abbreviated version of the PostScript page declaration
language. A PDF generated by a scanner application normally contains a
compressed TIFF image because that's directly compatible with fax software.
(And even if it isn't, ImageMagick can generally convert to whatever image
format you need.)

Some widely available applications, such as LibreOffice, can generate a PDF
with multiple items in the container at once. LibreOffice calls theirs a
"Hybrid PDF," and those PDF files contain the PostScript image AND the
document's editable source.

All this to say... If you acquired of one of the "specially formatted" PDF
documents intended for Quickbooks, I wonder what other document type might
they have they embedded into the file? For Quickbook's purposes it would
likely be a Quickbooks or OFX file because parsing the PostScript or
another image file format might be too unreliable. Of course they might do
something uncharitable like encrypt it or even compress it in an unusual
fashion to make reverse-engineering it a hurdle.

Most linux distributions include several useful PDF parsing and
manipulation utilities, so conceivably extracting useful data might be
relatively straightforward with a bit of command-line tinkering.
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-28 Thread jeffrey black
On 7/27/2018 11:22 PM, David Cousens wrote:
> deltatango,
>
> How does Quickbooks use the imported information from a statement? The only
> normal use I can think of is reconciliation of the account. For importing
> information I by far prefer OFX with csv as a fallback for those
> institutions which don't provide OFX files (Paypal is  a standout here). For
> reconciliation, I usually prefer a printed copy so I can tick transactions
> off as I reconcile them in GnuCash. I have used spreadsheets as well to mark
> off reconciled transactions where the statement was available in csv.
>
> David
>
>
>
> -
> David Cousens
> --
> Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
> ___
> gnucash-devel mailing list
> gnucash-devel@gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
> .
>
As near as I understand it, Quickbooks imports a specially formatted pdf 
file of a statement for reconciliation.  I am sure there is a large 
amount of money flowing between Quickbooks and Adobe for this right.  
Adobe has gone to great lengths to make their files viewable and printer 
printable only, unless you pay fees for features that used to be built 
in, like export to M$document files (which I used to use extensively for 
university extension publications).

Different banks, depending on their financial ability (homegrown vs 
multi state vs national vs international) have different methods of 
providing transaction detail.  My local bank only provided PDF, CSV, and 
eventually Qif, until they got bought out. But; then Jack Henry and 
Associates installed their so called "real time" accounting system which 
is seriously foo-bared.  Now I have the option of OFX with the new 
ownership.  The only advantage is GnuCash is optimized to OFX now.

If I go back more than 3 months, then everything is PDF only for 18 
months (no check pictures, costs money), more than that and you pay 
through the nose for hard copy records.  And before you rebuttal, I am 
more than 3 months behind on some of the accounts I am supposed to be 
tracking.  Getting kicked in the nuts by a bull tends to affect your 
lifestyle.  Fair play I guess, he was supposed to be a steer.

You and I can argue about the benefits of OFX vs QIF on private email.  
QIF better standard is than OFX.  Glove thrown, challenged are you.  
Yoda, love must you. (:->).

--JEffrey Black M.B.A.
  

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-27 Thread David Cousens
deltatango,

How does Quickbooks use the imported information from a statement? The only
normal use I can think of is reconciliation of the account. For importing
information I by far prefer OFX with csv as a fallback for those
institutions which don't provide OFX files (Paypal is  a standout here). For
reconciliation, I usually prefer a printed copy so I can tick transactions
off as I reconcile them in GnuCash. I have used spreadsheets as well to mark
off reconciled transactions where the statement was available in csv.

David



-
David Cousens
--
Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-27 Thread jeffrey black
On 7/26/2018 2:56 PM, deltatango wrote:
> Hello,
>
> Very interested in the possibility of importing PDF statements into GnuCash.
>
> I know Quickbooks now has this functionality.
>
> I searched online and found a few clunky possibilities that would convert
> the data into excel which can then be converted to csv and then imported
> into GnuCash.
>
> I was envisioning a system where you select a PDF statement to be imported.
>
> The program then asks you to select the area of the statement which contains
> the transactions, much like a photoshop selection. (And perhaps you could
> save templates of selections for different statements).
>
> Then some kind of OCR scanning reads the columns and data and convert it to
> columns/rows.
>
> Is this in the realm of possibility for some future release?
>
> It is so common now that exporting csv or qfx ,etc files from your bank only
> go so far back and you have to download PDFs instead...
>
> I dream, I hope...
>
> But in vain I wish not...
>
>
>
> --
> Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
> ___
> gnucash-devel mailing list
> gnucash-devel@gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
> .
>
I don't think this is an option with PDF files in the near future.  
Adobe has gone to great effort to make sure that all of the user 
friendly features of PDF reader are now a paid feature. Quickbook$ can 
afford to add this feature because of the high fees they charge you for 
"improvements and features".

As Jim DeLaHunt mentioned you might find some AI/Auto-learning software 
to convert it to a usable format, though I wouldn't hold my breath at 
this time.

Right now, the only suggestion I can offer, other than finding another 
type of input file, is to print the PDF and then scan it with OCR 
software so you can try to extract the information you are after.

--JEffrey Black M.B.A.
  

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-26 Thread Bob Gustafson

Take a look at Tesseract

https://github.com/tesseract-ocr/tesseract


On 07/26/2018 02:56 PM, deltatango wrote:


Hello,

Very interested in the possibility of importing PDF statements into GnuCash.

I know Quickbooks now has this functionality.

I searched online and found a few clunky possibilities that would convert
the data into excel which can then be converted to csv and then imported
into GnuCash.

I was envisioning a system where you select a PDF statement to be imported.

The program then asks you to select the area of the statement which contains
the transactions, much like a photoshop selection. (And perhaps you could
save templates of selections for different statements).

Then some kind of OCR scanning reads the columns and data and convert it to
columns/rows.

Is this in the realm of possibility for some future release?

It is so common now that exporting csv or qfx ,etc files from your bank only
go so far back and you have to download PDFs instead...

I dream, I hope...

But in vain I wish not...



--
Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-26 Thread John Ralls



> On Jul 26, 2018, at 12:56 PM, deltatango  wrote:
> 
> Hello,
> 
> Very interested in the possibility of importing PDF statements into GnuCash.
> 
> I know Quickbooks now has this functionality.
> 
> I searched online and found a few clunky possibilities that would convert
> the data into excel which can then be converted to csv and then imported
> into GnuCash.
> 
> I was envisioning a system where you select a PDF statement to be imported.
> 
> The program then asks you to select the area of the statement which contains
> the transactions, much like a photoshop selection. (And perhaps you could
> save templates of selections for different statements).
> 
> Then some kind of OCR scanning reads the columns and data and convert it to
> columns/rows.
> 
> Is this in the realm of possibility for some future release?
> 
> It is so common now that exporting csv or qfx ,etc files from your bank only
> go so far back and you have to download PDFs instead...
> 
> I dream, I hope...
> 
> But in vain I wish not...

It's very unlikely with the current very small development team.  We have other 
priorities that we expect to keep us fully engaged far into the future.

Regards,
John Ralls

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Import PDF to GnuCash

2018-07-26 Thread Jim DeLaHunt

Hello, Delta Tango:

I am not one of the Gnucash developers, but I am a software engineer who 
used to work for Adobe Systems (the creator of PDF), and so my ears 
perked up at your question.


On 2018-07-26 12:56, deltatango wrote:

Hello,

Very interested in the possibility of importing PDF statements into GnuCash.

I know Quickbooks now has this functionality.
Fascinating!  Could you perhaps send a link to a Quicken page explaining 
what PDF import functionality Quicken has?  I see this page 
 
which says, "Please be aware that Quicken cannot import … PDF … files."

…I was envisioning a system where you select a PDF statement to be imported.

The program then asks you to select the area of the statement which contains
the transactions, much like a photoshop selection. (And perhaps you could
save templates of selections for different statements).

Then some kind of OCR scanning reads the columns and data and convert it to
columns/rows.

Is this in the realm of possibility for some future release?
…


I am not a Gnucash developer, so I can't speak about what is in the 
realm of possibility for GnuCash.


I can speak about importing PDF files, in general.

PDF is a container; it can contain different kinds of content which 
might look the same to a human reading the PDF file. It might have a 
collection of commands, "use this font, draw this number '€2.500,00' at 
this location on the page". It might have those same commands, with 
annotations saying, "this is a subtotal". Or, it might have a bitmapped 
image which is a picture of a printed page with those numbers.  
Importing those different kinds of content are very different tasks.


It is like the answer to the question, "is it easy to pour out the 
contents of a cup?"  If the cup contains water: very easy. If the cup 
contains paint: easy for 80%, and you have to use a scraper to get out 
the other 20%. If the cup contains dried concrete: very hard.


If a PDF has a special kind of content which is marked up for easy 
extraction, then maybe it would be less effort to make an importer for 
GnuCash. If the PDF does not have markup, but does have commands to draw 
numbers at specific places, then my guess is that any importer would be 
doing the same thing as a tool that converts the PDF file to a CSV file, 
then imports the CSV file into GnuCash.


If the PDF file contains an image, the best you can do is perform OCR, 
then correct the OCR mistakes. Then you have a PDF file with commands to 
draw numbers at specific places, which you handle as in the case above.


It is conceivable to write a machine learning / artificial intelligence 
program to convert PDF files with statements into a data format which is 
practical to import to GnuCash. But the starting requirement for this is 
tens of thousands of PDF files with example statements, and perhaps the 
tens of thousands of data files corresponding to the PDF files, to use 
as references for the learning.


Now, a lot has changed since I worked on PDF. Maybe something new is 
possible that I don't know about. But this is the situation as I see it.


Best regards,

 —Jim DeLaHunt, Vancouver, Canada

--
--Jim DeLaHunt, j...@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
  multilingual websites consultant

  355-1027 Davie St, Vancouver BC V6E 4L2, Canada
 Canada mobile +1-604-376-8953

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


[GNC-dev] Import PDF to GnuCash

2018-07-26 Thread deltatango
Hello,

Very interested in the possibility of importing PDF statements into GnuCash.

I know Quickbooks now has this functionality.

I searched online and found a few clunky possibilities that would convert
the data into excel which can then be converted to csv and then imported
into GnuCash.

I was envisioning a system where you select a PDF statement to be imported.

The program then asks you to select the area of the statement which contains
the transactions, much like a photoshop selection. (And perhaps you could
save templates of selections for different statements).

Then some kind of OCR scanning reads the columns and data and convert it to
columns/rows.

Is this in the realm of possibility for some future release?

It is so common now that exporting csv or qfx ,etc files from your bank only
go so far back and you have to download PDFs instead...

I dream, I hope...

But in vain I wish not...



--
Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel