Re: optimizing reports

2007-10-10 Thread Josh Sled
Andrew Sackville-West [EMAIL PROTECTED] writes:
 I suppose this raises the larger issue of what is going on long-term
 with the reports. Is the amount of work necesary to fix the
 performance issues sufficient to warrant looking at the long term goal
 with reporting with an eye to just doing that change now? I know
 Derek wants to work e-guile into the mix and implement some kind of
 templating. Others have suggested binding in a whole 'nother language
 for reporting. I don't know the answers and lack the knowledge,
 experience and position to answer them.

 Should the reporting structure be re-worked altogether? now or later?
 Should the current structure be cleaned up in parallel to that effort
 to improve performance in the interim? Or should it be left as it is,
 just improved?

(What follows are excerpts from some notes I've been working on on the
subject.)


Reports have some problems:

- defined in scheme, hard to modify, weird errors, c.
- emit HTML, no conventional templating
- bad html, no css
- name-based identity
- options are stored as scheme code
- strange to add new report types
- report options are inconsistent
- report options dialogs are not HIG-compliant

In particular, the storage of report options and saved/open reports as
evaluated lisp expressions tightly couples us to a particular technology for
the reports.

The rough form of a revised reporting infrastructure I'd like to see is:

- reports declared as data, rather than registered as code.
- separation of the report generation code from the report rendering
  code. something like:

 book
  v   generator, report-provided
report-model  (dict)
  v   renderer (template application)
report-output (html)
  v   HTML engine, GOG
   display
  v
{screen,file}

Where the report generator specifically emits only a data structure (language
neutral, dictionary-list-string-number-boolean, c.) that is the input to the
rendering phase.  The separation supports separation of concerns, layering,
independent evolution and development/testing.


I think a report might be well-suited to be a bundle ... some structured,
self-contained collection of files.  It is a ${format} archive with a
known-location manifest file report.def, which contains the basic report
definition.  By example:

foo.tar:
- report.def
- report.script
- template.html
- local.css

report.def:

[gnucash-report-v2]

name = Foo Report
desc = The ISO-1234-26b foo report.

id = 0a1b2c3d4e5f6a7b8c9d0e1f

parent = assets-expense

report-type = scm
load-files = report.asl, helper.asl
options-entry-point = foo-options
generator-entry-point = foo-report
renderer-entry-point = template_apply

template-file = template.thtml

name.fr = Foo Réport
desc.fr = Lé réport du generallies foo...

A goal would be to have something like ~/.gnucash/reports.d/, where a user
could publish a new report (as that single archvie), and other users could
simply save it there and have it appear in their gnucash instance.


On the front-end, we should move to a more normal generation scripting
language (perl, python, ruby) and template-based rendering solution.  That
should be consumed by gecko, not gtkhtml, as it doesn't suck.

I could see a time where we are in transition, and have both v1 (existing)
and v2 (proposed here) reports co-implemented.


With respect to the Options, we should convert the options implementation
From scheme + closures to C + GObject/GInterface + signals.  The existing
saved/open reports are all basically of the form:

  (let ((optionDb (report-default-options report-name)))
 (set optionDb 'optionA new-value)
 (set optionDb 'optionB new-value)
 (create-report report-name optionDB))

As such, we strongly require a guile interpreter to parse/eval all this.  As
the options are moved into C, they'll need bindings to support at least
handling these existing files, but should then serialize back into a
non-guile-specific format.

I've started down this path on a private source tree.

-- 
...jsled
http://asynchronous.org/ - a=jsled; b=asynchronous.org; echo [EMAIL PROTECTED]


pgpYY4yXc1FwY.pgp
Description: PGP signature
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: optimizing reports

2007-10-07 Thread Dan Widyono
As a humble suggestion, it might pay large dividends to invest some time
researching scheme profiling methods.  This would cover Christian's ideas of
checking what was really bogging down the report(s), and would help you or
anyone else fix or speed up other reports in the future.  Perhaps folks
here already know of something.

Google popped this up in a brief perusal just now:
http://www.cygwin.com/ml/guile/2000-07/msg00206.html

BEGIN QUOTE
 Is there any Scheme code profiler that works with Guile?
 It seems Guile's core (libguile/eval.c) has no such code in it.
 Is it a good idea to work on this?  (I guess the debug evaluator
 may have such facilities...)

This is actually fairly easy.  Even the patch below gives some
useful information:

  % guile
  guile (set! *profile-all* #t)
  guile (use-modules (oop goops))
  guile (load profile.scm)
END QUOTE

This post included a short patch to guile source.  This is just an example, I
don't think it will really work (based on comments in follow-ups to that
post), but hopefully things have improved since 2000 when that post was made.

Regards,
Dan W.

 okay, thanks for this history. I agree (as I think everyone does) that
 there are some significant performance issues. As far as the
 layout/html stuff goes, I really don't care, but the performance is a
 huge factor.

FWIW, Firefox is pretty nimble for me most of the time.  Except when it comes
to rendering large tables.  Therefore, layout is intimately tied to
performance in this one particular extreme case (which happens to be
not-quite-just-a-corner-case in GnuCash reports).
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: optimizing reports

2007-10-07 Thread Derek Atkins
Andrew Sackville-West [EMAIL PROTECTED] writes:

 I suppose this raises the larger issue of what is going on long-term
 with the reports. Is the amount of work necesary to fix the
 performance issues sufficient to warrant looking at the long term goal
 with reporting with an eye to just doing that change now? I know
 Derek wants to work e-guile into the mix and implement some kind of
 templating. Others have suggested binding in a whole 'nother language
 for reporting. I don't know the answers and lack the knowledge,
 experience and position to answer them.

 Should the reporting structure be re-worked altogether? now or later?
 Should the current structure be cleaned up in parallel to that effort
 to improve performance in the interim? Or should it be left as it is,
 just improved?

My personal opinion:  I'd add in e-guile if I could find a
free weekend to actually hack on GnuCash.  I think it's a short
term fix for the templating issue, not a long-term solution.

In the long-term I think we need to change both the reporting
infrastructure and the display methodology.  I.e., I think we 
want to swap out GtkHTML for Gecko, and probably, simultaneously,
drop in a new reporting infrastructure.   Granted, these are
PROBABLY separable projects, but why not work to get it done
at the same time?

I think the newer system should definitely be template-based, and we
can choose whatever template wrapper language seems correct at the
time.

 A

-derek

-- 
   Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
   Member, MIT Student Information Processing Board  (SIPB)
   URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH
   [EMAIL PROTECTED]PGP key available
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: optimizing reports

2007-10-07 Thread Andrew Sackville-West
On Sun, Oct 07, 2007 at 09:48:00AM -0400, Derek Atkins wrote:
 
 My personal opinion:  I'd add in e-guile if I could find a
 free weekend to actually hack on GnuCash.  I think it's a short
 term fix for the templating issue, not a long-term solution.

ITs more complicated than I can handle at *this* point. Though I think
I understand the gist of it, the implementation is beyond me -- I lack
the knowledge of the existing codebase and some understanding of who
it all flows. Its been percolating in my brain for a couple years now
(and in fact I've got a tarball of e-guile sitting here), but I'venot
looked at gnucash code *at all* since then.

So, based on that, I'm going to forge ahead with the current project
(fixing up the income statement so that it runs faster). That'll do
two things 1)make my accoutning go much faster and 2) get me familiar
with the code again so that maybe in the future I can help with the
below.

A

 
 In the long-term I think we need to change both the reporting
 infrastructure and the display methodology.  I.e., I think we 
 want to swap out GtkHTML for Gecko, and probably, simultaneously,
 drop in a new reporting infrastructure.   Granted, these are
 PROBABLY separable projects, but why not work to get it done
 at the same time?
 
 I think the newer system should definitely be template-based, and we
 can choose whatever template wrapper language seems correct at the
 time.
 
  A
 
 -derek
 
 -- 
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board  (SIPB)
URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH
[EMAIL PROTECTED]PGP key available
 ___
 gnucash-devel mailing list
 gnucash-devel@gnucash.org
 https://lists.gnucash.org/mailman/listinfo/gnucash-devel
 

-- 


signature.asc
Description: Digital signature
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: optimizing reports

2007-10-06 Thread Christian Stimming
Am Samstag, 6. Oktober 2007 05:28 schrieb Andrew Sackville-West:
 I'm bent on improving the speed of some of these reports 

That's great! I think I've mentioned that before, but I'm also very unhappy 
with the current speed aka slowness of the text reports. I'm especially 
unhappy about that because my initial implementation of those was 
significantly faster, and only the rewriting of those reports by the changes 
mainly in r10078 (David Montenegro, June/July 2004) made them much slower 
(see my gnucash-devel msgs on 2006-06-20 and 2006-03-19).

 My investigation into this points to the function
 gnc:html-acct-table-add-accounts! in
 reports/report-system/html-acct-table.scm. This function appears to be
 the workhorse of the income statement but is also used in balance
 sheet, budget and trial balance reports, so fixing it would likely
 help those guys as well.

Yes. But we would have to find out more specifically the sub-functions where 
the CPU time is actually spent.

 Currently the report recurses through the account tree gathering
 totals for each account and it sub accounts. It appears that it walks
 all the way out to the leaf nodes at each level, so that the sub
 account totals get calculated repeatedly making this a hugely
 inefficient function. 

Yes. However, the previous implementation did exactly the same in terms of 
balance calculation. It is still available in html-utilities.scm's function 
gnc:html-build-acct-table and in particular add-group! there. 

For that reason I would believe the balance calculation itself doesn't seem to 
be the main problem. The main problem must be somewhere else around this... 
for example, the newer code might run the balance calculation on much more 
accounts than the old account; or the large (append env ...) statement in 
html-acct-table.scm:746 ff might consume a lot of time; or yet something 
else.

 I want to clean that up and what I'm thinking is to recurse through
 the tree once totalling up each relevant account and returning those
 totals in some structure that contains the accounts and their
 totals. Then walk through the tree generating the output table based
 on the required depth. This means I'd still be walking a tree
 structure twice, but I'd only be doing the per-account math once. 

Again, this might indeed help, but on the other hand this inefficiency was 
present in the earlier implementation and didn't seem to cause big problems 
there. 

It might still be reasonable to work on this part, but maybe it would pay off 
to examine a bit more whether this is really the trouble-causing part.

Christian
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: optimizing reports

2007-10-06 Thread Andrew Sackville-West
On Sat, Oct 06, 2007 at 11:23:53AM +0200, Christian Stimming wrote:
 Am Samstag, 6. Oktober 2007 05:28 schrieb Andrew Sackville-West:
  I'm bent on improving the speed of some of these reports 
 
 That's great! I think I've mentioned that before, but I'm also very unhappy 
 with the current speed aka slowness of the text reports. I'm especially 
 unhappy about that because my initial implementation of those was 
 significantly faster, and only the rewriting of those reports by the changes 
 mainly in r10078 (David Montenegro, June/July 2004) made them much slower 
 (see my gnucash-devel msgs on 2006-06-20 and 2006-03-19).

I'll check those out. thanks.

 
  My investigation into this points to the function
  gnc:html-acct-table-add-accounts! in

...

 
 Yes. But we would have to find out more specifically the sub-functions where 
 the CPU time is actually spent.
...

 be the main problem. The main problem must be somewhere else around this... 
 for example, the newer code might run the balance calculation on much more 
 accounts than the old account; or the large (append env ...) statement in 
 html-acct-table.scm:746 ff might consume a lot of time; or yet something 
 else.

I'll spend some more time trying to narrow it down. There is one part
that references (in the comments) recursing over accounts that aren't
even used.



 
  I want to clean that up and what I'm thinking is to recurse through
  the tree once totalling up each relevant account and returning those
  totals in some structure that contains the accounts and their
  totals. Then walk through the tree generating the output table based
  on the required depth. This means I'd still be walking a tree
  structure twice, but I'd only be doing the per-account math once. 
 
 Again, this might indeed help, but on the other hand this inefficiency was 
 present in the earlier implementation and didn't seem to cause big problems 
 there. 
 
 It might still be reasonable to work on this part, but maybe it would pay off 
 to examine a bit more whether this is really the trouble-causing part.

In all honesty, and in due respect to whoever rewrote that part, it
looks like it was written by someone who doesn't know scheme or
functional languages in general. I'll admit that I have next to no
experience except that my earlier years of hacking involved som euse
of funtional languages and they just seem to work for my brain. So
there are lots of things that are done in that function that appear to
my eye to be done very inefficiently. I saw that and sort of assumed
that the rpoblem was the overall achitecture of the function. 

Of course, I could be completely wrong and it could certainly be any
number of other things called from within that function. I'll dig in
some more and see if can can narrow it down to more than just the
whole function is slow. Though I still believe that to be the
case. :)

thanks Christian, I'll be in touch.

A


signature.asc
Description: Digital signature
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: optimizing reports

2007-10-06 Thread Andrew Sackville-West
On Sat, Oct 06, 2007 at 09:11:57AM -0700, Andrew Sackville-West wrote:

...
 
 In all honesty, and in due respect to whoever rewrote that part, it
 looks like it was written by someone who doesn't know scheme or
 functional languages in general. 

that came out way worse sounding than I meant or believe. I hope no
one was offended. And I apologise.

 I'll admit that I have next to no
 experience except that my earlier years of hacking involved som euse
 of funtional languages and they just seem to work for my brain. 

emphasis on the next to no experience and thats double-plus true.

humbly

A


signature.asc
Description: Digital signature
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: optimizing reports

2007-10-06 Thread Andrew Sackville-West
On Sat, Oct 06, 2007 at 11:23:53AM +0200, Christian Stimming wrote:
 Am Samstag, 6. Oktober 2007 05:28 schrieb Andrew Sackville-West:
  I'm bent on improving the speed of some of these reports 
 
 That's great! I think I've mentioned that before, but I'm also very unhappy 
 with the current speed aka slowness of the text reports. I'm especially 
 unhappy about that because my initial implementation of those was 
 significantly faster, and only the rewriting of those reports by the changes 
 mainly in r10078 (David Montenegro, June/July 2004) made them much slower 
 (see my gnucash-devel msgs on 2006-06-20 and 2006-03-19).

okay, thanks for this history. I agree (as I think everyone does) that
there are some significant performance issues. As far as the
layout/html stuff goes, I really don't care, but the performance is a
huge factor. I think that the formatting issues could be dealt with at
a later time, especially in light of how long it would take to modify
code.run test reports/look at results with really slow reports ;)

I suppose this raises the larger issue of what is going on long-term
with the reports. Is the amount of work necesary to fix the
performance issues sufficient to warrant looking at the long term goal
with reporting with an eye to just doing that change now? I know
Derek wants to work e-guile into the mix and implement some kind of
templating. Others have suggested binding in a whole 'nother language
for reporting. I don't know the answers and lack the knowledge,
experience and position to answer them.

Should the reporting structure be re-worked altogether? now or later?
Should the current structure be cleaned up in parallel to that effort
to improve performance in the interim? Or should it be left as it is,
just improved?

A


signature.asc
Description: Digital signature
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


optimizing reports

2007-10-05 Thread Andrew Sackville-West
Hi guys, 

I'm bent on improving the speed of some of these reports and want to
bounce some ideas off you folks. The particular report I'm interested
in improving is the income statement. On my smaller file (133k) it
takes about 20 seconds to run a bog standard income statement. I
haven't timed it on my larger file (1.4M), but it takes way too
long (I usually switch over to /. please help me!).

My investigation into this points to the function
gnc:html-acct-table-add-accounts! in
reports/report-system/html-acct-table.scm. This function appears to be
the workhorse of the income statement but is also used in balance
sheet, budget and trial balance reports, so fixing it would likely
help those guys as well. 

Currently the report recurses through the account tree gathering
totals for each account and it sub accounts. It appears that it walks
all the way out to the leaf nodes at each level, so that the sub
account totals get calculated repeatedly making this a hugely
inefficient function. For example, giving this:

Toplvl--- A --- A1
   |-A2
   |-A3
   |-A4A4a
  |-A4b
 
To calculate the balances of all these, it would calculate the whole
tree for the balance ot Toplvl; re-calculate all of A sub-tree to get A's
balance; re-calculate A1; re-calc A2; re-calc A3; re-calc the whole A4
sub-tree; then re-calc A4a; re-calc A4b etc etc etc... bad.

I want to clean that up and what I'm thinking is to recurse through
the tree once totalling up each relevant account and returning those
totals in some structure that contains the accounts and their
totals. Then walk through the tree generating the output table based
on the required depth. This means I'd still be walking a tree
structure twice, but I'd only be doing the per-account math once. I
imagine the first walk would end up returning a list of toplevel
accounts, each member of which would be a cons of that account's
balance and a list of it subaccounts, each member of which would be a
cons... you get the idea.

So before I start hacking my fingers off, does this idea make sense?
(it does to me...) or is there something blatantly obvious that I'm
missing in this general idea? Also, if I'm mis-reading that code,
please let me know, but I think I have the gist of it pretty well.

A


signature.asc
Description: Digital signature
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel