Re: Google Summer of Code

Rick Hillegas Thu, 18 Mar 2010 12:20:50 -0700

Hi John,

I agree with Bryan that we want to compare Derby's behavior to that ofother databases so that we can understand


1) where Derby diverges from the SQL Standard

2) what features we should add to Derby to increase its usefulness

As you note, datatypes are an important compatibility topic. I thinkthat you are on the right track when you talk about ORM frameworks. Ithink that we would understand a great deal about datatype compatibilityby studying how the major ORM frameworks map datatypes onto variousrdbmses. The engineers at Hibernate, Toplink, etc. have done a lot ofthe analysis already. A systematic compilation of their findings wouldbe useful.


Thanks,
-Rick

Ян Программист wrote:

2010/3/17 Ян Программист <[email protected]<mailto:[email protected]>>:

>>  * No actual compatibility tests for Oracle, SQL Server
>>
>>
>>

>> I imagine that there are some existing compatibility test suitesout here. It would be interesting to understand Derby's compatibilitywith other databases. This sounds like a task >on which a GSoC studentcould make good progress.

>
>    Thanks,
>    -Rick
>
Well, some useful compatibility features to focus on:

   1. Converting data from one database to another in a respective way
      against data consistency. The reasons why special software for
      converting between different databases is useless. Problem of
      different actual length of data in table cells. Acceptableness
      of creation and population of data entries in additional tables
      with equal column structure for columns (read: table schema) for
      entries with incompatible sizes (for column types of consuming
      tables in a consuming database)
   2. Converting data in a entity centric manner. Convert data of
      similar column types from a point of stored data. If the
      database client code contain strategy switching logic of how to
      interpret the state of data, described in database, it is
      possible to populate entries in other tables of consuming
      database to avoid loss of data consistency
   3. Converting types is not always good. Compatibility is rather is
      if consuming database would read entries with a special
      formatter - to interpret data as it's own type. So you can copy
      data as binary heap and force consuming database to enable
      special formatter to read data even at production enviroment
      (formatter would rather point on certain points of a cell data
      chunk in each number/word/... of actually represented data to
      avoid dealing with useless binary delimiters). But that is for
      data types which is for representing the same class
      (numeric-to-numeric, date-to-date). Stored data in cells could
      be forced to converted to native to avoid usage of special
      formatter, to use normal formatter when consistency of state of
      entities(ORM?), represented in entries is ensured.
   4. Column type compatibility is not an index compatibility. Source
      database contains lots of indexes to speed the JOINs. So it is
      normally required for consuming database to force indexing
      consumed data entries. Index simply represent positions of
      cells, so in different databases it would be different positions
      for the same data due to cell data size & binary delimiters (a
      part of storage engine stuff). But delimiters are the same for
      same storage engine, cell data sizes vary from to cell. Still do
      not know how to fix a problem - but it would be awesome to
      apply, for example, MySQL indexes on JavaDB without re-indexing
      on JavaDB side
   5. I see a VARBINARY as a candidate for dealing with aggregation in
      ORM. When you will need a convertion due to any reasons - you
      can get entity data, described in ORM classes with ORM, where
      aggregated entities are virtualised with referencing columns on
      either table, you can get entries from aggregated table (entry
      of property set of entity) and store it into VARBINARY of
      consuming database, and because it is known how the ORM class
      entries would be iterated in tables of consumer database - you
      can read at consumer side in a binary formatter, and use that
      data how you want. Convertion of consumer's database VARBINARY
      to a set of columns as native ones for aggregated entity entries
      is fast enough(?)
   6. All that numeric precision delaing stuff. Yeap, ODBC NUMERIC
      type will not have any sence if column compatibility would be
      implemented (precision auto-dealing when reading from cells with
      different length (count of digits) and float-to-int and other
      stuff). Can help avoiding some ODBC code, as also as JDBC code,
      both relative to column type handling, when accessing data from
      non-native database clients( access JavaDB from C# or SQLite
      from Java, for example)

*XML persistence driven conversions* of both schema and data and*"hot" binary copying* of non-native data from differentdatabases(special formatters in (1)) are two main goals here. John

Re: Google Summer of Code

Reply via email to