David,

first of all, thanks for the insight, much appreciated.

David E. Jones wrote:

Jacopo,

I like the idea of this approach, and it's something I've been thinking about for a while.

A good next step might be to do a PoC implementation with maybe just one star schema using a simple-method to do the ETL, and a form widget to report on the data. In that process we'll probably find some things that are ugly and could use some tool extensions.


I agree with you that we should start with a pretty simple PoC implementation. However, while reading the book "The Datawarehouse Toolkit" in my (spare?) time I'm taking notes about tool extensions that will help to build the datawarehouse; here are a few examples of these very preliminary requirements:

1) add support in the entity engine for view-entities that are stored in the db as sql VIEWs (it seems that we will need many different views over the same shared dimensions; for example the "Date" dimension, could play different roles in the datawarehouse)

2) implement util methods to populate the Date dimension (that will store all the days of the years of interest for the analysis) and the Time dimenstion (all the minutes of a day)

etc...

BTW, there are some interesting generic star schemas in the Data Model Resource Book. There is a sales one with some requirements in the form of questions on page 370 of volume 1, and a model diagram on page 371.


I will definitely have a look at them.

The data warehouse entities should go into their own entity group, a new group, with it's own datasource in the OOTB entityengine.xml file. A good group name might be something like org.ofbiz.olap (as opposed to oltp which characterizes most of the current entities.


I agree with everything you write here; by the way Iìd like to clarify that the name "olap" here will be used in its original (generic) meaning of "Online Analytic Processing" and not to designate the underlying technology of the db (that in my initial plan will be a relational db with star schemas and not an olap db with cubes).

Anyway, yeah this would be cool, and we already have a lot of tools that would work really well here.


I will try to write down some notes in the Confluence doc site and see if we can get this ball rolling... do you think that the upcoming conference would be a good chance to speed up things?

Jacopo

-David


On Feb 16, 2007, at 12:15 AM, Jacopo Cappellato wrote:

Christopher,

still nothing official or concrete, as far as I know.
I know that Chris Howe did some integration tests with OpenI:
http://docs.ofbiz.org/x/0AI

I am seriously considering a different approach; while I'm studying the book "The Datawarehouse Toolkit" (http://ofbiz.apache.org/documents.html), I'm trying to draft out a proposal for the implementation of base datawarehousing features in OFBiz (a separate set of entities for dimensions, facts and start schemas; ETL services based on minilang, tools to manage the dimensions tables and synchronization).
You'll find some of my notes here:
http://docs.ofbiz.org/x/2QI
My goal is this: once we have a set of star schemas (facts and dimensions) derived from OFBiz entities and based on best practices, and the tools to manage the data in them, we could integrate a visual reporting tool to run reports against them (or just use, with some improvements the form widgets). It may seem an ambitious plan, but I think that many of the building blocks to complete it are already in the framework, we'll just have to improve and fine tune them. If you are interested in helping with this we could try to create a work group for this...

Jacopo

Christopher Snow wrote:
Is there any work going on at the moment to build an ofbiz data warehouse and reporting infrastructure, for example the integration of the Pentaho toolset?
Many thanks ...
--This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.





Reply via email to