Re: New DBD::File feature - can we do more with it?

2010-06-30 Thread Darren Duncan

H.Merijn Brand wrote:

A bit more thought leads to better portability ...

my $hash = {

meta => {
f_dir   => "data",
f_schema=> undef,
f_ext   => ".csv/r",
f_encoding  => "utf8",
f_lock  => 1,
},


Something very important here, arguably the most important thing to get right at 
the start, is that the API has to be versioned.


We have to start off with a ridiculously simple metaformat, such as simply a 
2-element array.  The first element of this array declares the version of the 
format of the metadata, where that is provided as the second element.


For example:

  my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
meta => [['DBI','http://dbi.perl.org/','0.001'],{
  ...
}],
  }) or die DBI->errstr;

Or if that's too complicated, then:

  my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
meta => ['0.001',{
  ...
}],
  }) or die DBI->errstr;

The format here is based on that used by Perl 6 for fully-qualified Perl or 
module names, with parts [base-name, authority, vnum].  Something similar is 
also mandated by Muldis D for starting off code written in it.


For that matter, look at the META.yml files bundled with Perl modules these 
days, which declare the version of the spec they adhere to.


The point is that we will give ourselves a lot of room to evolve cleanly if we 
start using declared versions in the beginning, so that any subsequent version 
of DBI can interpret the user arguments unambiguously to their intent, because 
the users are telling them, the input is in this version of the format.


The second element can even change over time, say from a hash-ref to an 
array-ref or vice-versa, depending what the first element is.


So I strongly suggest starting from the point of mandating declared versions and 
go from there.


-- Darren Duncan


Re: New DBD::File feature - can we do more with it?

2010-06-30 Thread H.Merijn Brand
On Wed, 30 Jun 2010 08:37:33 +0200, "H.Merijn Brand"
 wrote:

> On Tue, 29 Jun 2010 19:12:19 +, Jens Rehsack
>  wrote:
> 
> > Hi,
> > 
> > because of a bogus implementation for PRECISION, NULLABLE etc.
> > attributes in DBD::CSV (forced by limitations of SQL::Statement)
> > we have a new attribute for tables in DBD::File meta data:
> > 'table_defs'. This is filled when a 'CREATE TABLE ...' is executed
> > and copies the $stmt->{table_defs} structure (containing column
> > name and some more information - what ever could be specified
> > using (ANSI) SQL to create tables).
> > 
> > Could it makes sense to have a DBD::File supported way to store
> > and load this meta-data (serialized, of course)?
> 
> I'm all in favor of saving data-dictionary info in some persistent
> way, but as Tim said, not by default.
> 
> It should be a user option, and the interface should be configurable
> DBD::File should support what is installed, but only *if* it is
> installed and available. I personally would prefer JSON, with YAML
> on second position both very well fit the bill for DBD dict storage
> YAML is available for CPAN
> 
>   my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
>   f_dir  => "data",
>   f_ext  => ".csv/r",
>   f_encoding => "utf8",
>   f_schema   => undef,
>   f_dict => $dict,
>   }) or die DBI->errstr;
> 
> where $dict is
> 1. A hash   see below
> 2. A hashrefref to 1.
> 3. A filename   filename
> 4. A listref[ filename, storage type ]
> 
> The hash/ref from the DDD can be read from or written to file in case
> of 3 or 4. this way we are backward compatible and support more than
> ever before. The content of the hash should be very well documented and
> all errors in it should optionally be ignored.
> 
> "storage type" can be any means of persistence: "JSON", "YAML",
> "Storable", "Freeze", where "Storable" is the default as it is
> available in CORE since ages.
> 
> The hash could be something like ...
> 
> my $hash = {
> foo  => [
> [ "bar",[ "numeric", 4 ], [ "not null", "primary key" ] ],
> [ "baz",[ "integer"],   ],
> [ "drz",[ "char",   60 ],   ],
> [ "fbg",[ "numeric", 2, 2 ], [ ], [ "default", 0.0 ],   ],
> [ "c_base", [ "numeric", 4 ], [ "not null" ]],
> ],
> base => [
> [ "c_base", [ "numeric", 4 ], [ "not null", "primary key" ] ],
> [ ...
> :
> "\cA:links" => [
> [ "foo.bar" => "base.c_base" ],
> :

A bit more thought leads to better portability ...

my $hash = {

meta => {
f_dir   => "data",
f_schema=> undef,
f_ext   => ".csv/r",
f_encoding  => "utf8",
f_lock  => 1,
},

tables => {

foo => {
meta => {
file  => "foo.csv",
key   => [ "c_foo" ],
},
fields => [
[ c_foo => {
  NAME   => "c_foo",
  NAME_lc=> "c_foo",
  NAME_uc=> "C_FOO",
  NULLABLE   => 0,
  PRECISION  => 2,
  SCALE  => undef,
  TYPE   => 5,
  TYPE_NAME  => "NUMERIC"
  }],

[ foo => {
  NAME   => "foo",
  NAME_lc=> "foo",
  NAME_uc=> "FOO",
  NULLABLE   => 1,
  PRECISION  => 40,
  SCALE  => undef,
  TYPE   => 12,
  TYPE_NAME  => "VARCHAR"
  }],
:
:
],
},
:
:
},

links => {
"bar.c_foo" => "foo.c_foo",
:
},

indices => {
foo => [
[ "c_foo", "foo DESC" ],
:
],
:

-- 
H.Merijn Brand  http://tux.nl  Perl Monger  http://amsterdam.pm.org/
using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3.
http://mirrors.develooper.com/hpux/   http://www.test-smoke.org/
http://qa.perl.org  http://www.goldmark.org/jeff/stupid-disclaimers/


Re: New DBD::File feature - can we do more with it?

2010-06-30 Thread Jens Rehsack

On 06/30/10 06:37, H.Merijn Brand wrote:

On Tue, 29 Jun 2010 19:12:19 +, Jens Rehsack
  wrote:


Hi,

because of a bogus implementation for PRECISION, NULLABLE etc.
attributes in DBD::CSV (forced by limitations of SQL::Statement)
we have a new attribute for tables in DBD::File meta data:
'table_defs'. This is filled when a 'CREATE TABLE ...' is executed
and copies the $stmt->{table_defs} structure (containing column
name and some more information - what ever could be specified
using (ANSI) SQL to create tables).

Could it makes sense to have a DBD::File supported way to store
and load this meta-data (serialized, of course)?


I'm all in favor of saving data-dictionary info in some persistent
way, but as Tim said, not by default.

It should be a user option, and the interface should be configurable
DBD::File should support what is installed, but only *if* it is
installed and available. I personally would prefer JSON, with YAML
on second position both very well fit the bill for DBD dict storage
YAML is available for CPAN

   my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
   f_dir  =>  "data",
   f_ext  =>  ".csv/r",
   f_encoding =>  "utf8",
   f_schema   =>  undef,
   f_dict =>  $dict,
   }) or die DBI->errstr;

where $dict is
1. A hash   see below
2. A hashrefref to 1.
3. A filename   filename
4. A listref[ filename, storage type ]

The hash/ref from the DDD can be read from or written to file in case
of 3 or 4. this way we are backward compatible and support more than
ever before. The content of the hash should be very well documented and
all errors in it should optionally be ignored.

"storage type" can be any means of persistence: "JSON", "YAML",
"Storable", "Freeze", where "Storable" is the default as it is
available in CORE since ages.

The hash could be something like ...

my $hash = {
 foo  =>  [
 [ "bar",[ "numeric", 4 ], [ "not null", "primary key" ] ],
 [ "baz",[ "integer"],   ],
 [ "drz",[ "char",   60 ],   ],
 [ "fbg",[ "numeric", 2, 2 ], [ ], [ "default", 0.0 ],   ],
 [ "c_base", [ "numeric", 4 ], [ "not null" ]],
 ],
 base =>  [
 [ "c_base", [ "numeric", 4 ], [ "not null", "primary key" ] ],
 [ ...
 :
 "\cA:links" =>  [
 [ "foo.bar" =>  "base.c_base" ],
 :

that was just a braindump.


I would really like to do this - this would bring us a big step
in the right direction.

DBD::DBM could store it in it's meta-data (instead of saving column
names it could safe the entire table_defs structure), but what should
DBD::CSV do?

Best regards,
Jens


We talked in IRC about the above ...

Summary: The main intention points to dictionary support for databases.
This will require as well SQL::Statement 2 and increased feature realm in
DBI::DBD::SqlEngine. Currently I'm unsure how we deal with DBI::SQL::Nano
regarding that (separate mail).

IRC backlog:
<@[Tux]> I'm all in favor of saving data-dictionary info in some persistent way.
<@[Tux]> But as timbo said, not by default
<@[Tux]> it should be a user option, and the interface should be configurable
<@[Tux]> DBD::File should support what is installed, but only *if* it is 
installed

<@[Tux]> and available
<@[Tux]> I personally would prefer JSON, with YAML on second position
<@[Tux]> both very well fit the bill for DBD dict storage
<@[Tux]> YAML is available for CPAN
<@[Tux]> DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "data", f_dict 
=> ... });

<@[Tux]> where ... is
<@[Tux]> 1. A filename
<@[Tux]> 2. A hash
<@[Tux]> 3. A hashref
<@[Tux]> The hash/ref from the DDD can be read from or written to file in 
case of 1.

<@[Tux]> this way we are backward compatible and support more than ever before
* You are now known as Sno|
<@Sno|> timbo? Do you mean Darren?
<@Sno|> [Tux]: your f_dict => parameter list doesn't fit requirements of 
DBD::DBM

<@[Tux]> it was just a basic brain dump
<@[Tux]> all "\cA:..." entries are non-tabel names in that example, but it 
might be better to change that to
<@[Tux]> { tables => { foo => [ ... ], ... }, links => { ... }, indices => { 
... }};

<@[Tux]> extend to fit requirement of whatevah
<@Sno|> well, what's with the idea to add an optional second header line to 
a csv file?

<@[Tux]> no
<@Sno|> ok, this was clear :)
<@[Tux]> that would mean it is not a CSV file anymore
<@[Tux]> CSV's are MOSTLY from exports and for imports
<@[Tux]> if you alter the content just for DBD, it would defeat all other uses
<@Sno|> ok
<@Sno|> than DBD::File should have at least 2 parameters: f_serializer and 
f_dict

<@[Tux]> why serializer?
<@Sno|> to choose JSON, YAML etc.
<@[Tux]> see option 4
<@[Tux]> f_dict => [ "ddd.json", "JSON" ],
<@Sno|> I'd prefer to handle those separate
<@[Tux]> I don't
<@Sno|> because of, DBD::DBM would not store that in a separate file
<@[Tux]> f_dict =

Re: New DBD::File feature - can we do more with it?

2010-06-29 Thread H.Merijn Brand
On Tue, 29 Jun 2010 19:12:19 +, Jens Rehsack
 wrote:

> Hi,
> 
> because of a bogus implementation for PRECISION, NULLABLE etc.
> attributes in DBD::CSV (forced by limitations of SQL::Statement)
> we have a new attribute for tables in DBD::File meta data:
> 'table_defs'. This is filled when a 'CREATE TABLE ...' is executed
> and copies the $stmt->{table_defs} structure (containing column
> name and some more information - what ever could be specified
> using (ANSI) SQL to create tables).
> 
> Could it makes sense to have a DBD::File supported way to store
> and load this meta-data (serialized, of course)?

I'm all in favor of saving data-dictionary info in some persistent
way, but as Tim said, not by default.

It should be a user option, and the interface should be configurable
DBD::File should support what is installed, but only *if* it is
installed and available. I personally would prefer JSON, with YAML
on second position both very well fit the bill for DBD dict storage
YAML is available for CPAN

  my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
  f_dir  => "data",
  f_ext  => ".csv/r",
  f_encoding => "utf8",
  f_schema   => undef,
  f_dict => $dict,
  }) or die DBI->errstr;

where $dict is
1. A hash   see below
2. A hashrefref to 1.
3. A filename   filename
4. A listref[ filename, storage type ]

The hash/ref from the DDD can be read from or written to file in case
of 3 or 4. this way we are backward compatible and support more than
ever before. The content of the hash should be very well documented and
all errors in it should optionally be ignored.

"storage type" can be any means of persistence: "JSON", "YAML",
"Storable", "Freeze", where "Storable" is the default as it is
available in CORE since ages.

The hash could be something like ...

my $hash = {
foo  => [
[ "bar",[ "numeric", 4 ], [ "not null", "primary key" ] ],
[ "baz",[ "integer"],   ],
[ "drz",[ "char",   60 ],   ],
[ "fbg",[ "numeric", 2, 2 ], [ ], [ "default", 0.0 ],   ],
[ "c_base", [ "numeric", 4 ], [ "not null" ]],
],
base => [
[ "c_base", [ "numeric", 4 ], [ "not null", "primary key" ] ],
[ ...
:
"\cA:links" => [
[ "foo.bar" => "base.c_base" ],
:

that was just a braindump.

> I would really like to do this - this would bring us a big step
> in the right direction.
> 
> DBD::DBM could store it in it's meta-data (instead of saving column
> names it could safe the entire table_defs structure), but what should
> DBD::CSV do?
> 
> Best regards,
> Jens


-- 
H.Merijn Brand  http://tux.nl  Perl Monger  http://amsterdam.pm.org/
using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3.
http://mirrors.develooper.com/hpux/   http://www.test-smoke.org/
http://qa.perl.org  http://www.goldmark.org/jeff/stupid-disclaimers/


Re: New DBD::File feature - can we do more with it?

2010-06-29 Thread Jens Rehsack

On 06/29/10 19:58, Darren Duncan wrote:

Jens Rehsack wrote:

because of a bogus implementation for PRECISION, NULLABLE etc.
attributes in DBD::CSV (forced by limitations of SQL::Statement)
we have a new attribute for tables in DBD::File meta data:
'table_defs'. This is filled when a 'CREATE TABLE ...' is executed
and copies the $stmt->{table_defs} structure (containing column
name and some more information - what ever could be specified
using (ANSI) SQL to create tables).

Could it makes sense to have a DBD::File supported way to store
and load this meta-data (serialized, of course)?

I would really like to do this - this would bring us a big step
in the right direction.

DBD::DBM could store it in it's meta-data (instead of saving column
names it could safe the entire table_defs structure), but what should
DBD::CSV do?


You have several possible options and the short answer is "let the user
tell you".


This doesn't answer the real question:

What's the best way to provide an implementation and how far should the 
DBD::File (part of DBI, remember) should be done?



For backwards or sideways compatibility, don't store any extra metadata
by default; the only metadata is what can be gleaned from the normal CSV
file itself.


Fair.


Then, if the user tells you to via extra DBI config args, then retrieve
or store extra metadata in accordance with those args. For example, the
extra args may give you a second file name in which the meta data is /
is to be stored. Or the extra args may indicate that one of the first
rows in the CSV file contains metadata rather than normal data.

For example, if all of your metadata is column-specific, the CSV file
could contain 2 heading rows instead of the usual one (or none)? The
first heading row would be the name of the column as usual. The new
second heading row would be the encoded type/constraint/etc definition
that says how to interpret the data of the column, and then the
third-plus columns are the data.


These are both interesting points - but I'm unsure if FreezeThaw can
be used for serialization for the 2nd option - JSON looks like a better
candidate, but is not in core.


But the point is, let the user tell you how to interpret the particular
CSV files they throw at you, and so DBD::CSV is then more flexible and
compatible.


User can only choose provided choices ;)
What is really interesting to me is to discuss the different choices,
how they impact the different modules DBD::File, DBD::CSV, DBD::DBM,
DBD::PO, later DBD::AnyData and what might come, too.

Currently - especially for DBD::AnyData, DBI::DBD::SqlEngine is limited.
The $m->{table_defs} support belongs to there - as well as some other
table meta data handling.

Maybe a kind of table API extension (optionally, of course) would be
a nice idea to discuss, too.

Jens


Re: New DBD::File feature - can we do more with it?

2010-06-29 Thread Darren Duncan

Jens Rehsack wrote:

because of a bogus implementation for PRECISION, NULLABLE etc.
attributes in DBD::CSV (forced by limitations of SQL::Statement)
we have a new attribute for tables in DBD::File meta data:
'table_defs'. This is filled when a 'CREATE TABLE ...' is executed
and copies the $stmt->{table_defs} structure (containing column
name and some more information - what ever could be specified
using (ANSI) SQL to create tables).

Could it makes sense to have a DBD::File supported way to store
and load this meta-data (serialized, of course)?

I would really like to do this - this would bring us a big step
in the right direction.

DBD::DBM could store it in it's meta-data (instead of saving column
names it could safe the entire table_defs structure), but what should
DBD::CSV do?


You have several possible options and the short answer is "let the user tell 
you".

For backwards or sideways compatibility, don't store any extra metadata by 
default; the only metadata is what can be gleaned from the normal CSV file itself.


Then, if the user tells you to via extra DBI config args, then retrieve or store 
extra metadata in accordance with those args.  For example, the extra args may 
give you a second file name in which the meta data is / is to be stored.  Or the 
extra args may indicate that one of the first rows in the CSV file contains 
metadata rather than normal data.


For example, if all of your metadata is column-specific, the CSV file could 
contain 2 heading rows instead of the usual one (or none)?  The first heading 
row would be the name of the column as usual.  The new second heading row would 
be the encoded type/constraint/etc definition that says how to interpret the 
data of the column, and then the third-plus columns are the data.


But the point is, let the user tell you how to interpret the particular CSV 
files they throw at you, and so DBD::CSV is then more flexible and compatible.


I will be taking a similar approach in the near future when implementing my 
Muldis D language over legacy systems such as SQL databases, whose metadata 
generally aren't as rich as mine.  Over a SQL database, I would provide at least 
3 options to users that they specify when connecting to a SQL database using 
Muldis D:  1. There is no extra metadata and just reverse-engineer the SQL 
metadata; this is also the degrade gracefully approach.  2. The extra metadata 
is stored in tables of another SQL schema of the same database and the user 
names that schema when connecting; this way, users can name their own stuff 
anything they want and they pick where my metadata goes, out of their way.  3. 
The extra metadata is stored outside the database in a local file; this file is 
named by the user when connecting.  I think that #2 is the best option for that.


-- Darren Duncan


New DBD::File feature - can we do more with it?

2010-06-29 Thread Jens Rehsack

Hi,

because of a bogus implementation for PRECISION, NULLABLE etc.
attributes in DBD::CSV (forced by limitations of SQL::Statement)
we have a new attribute for tables in DBD::File meta data:
'table_defs'. This is filled when a 'CREATE TABLE ...' is executed
and copies the $stmt->{table_defs} structure (containing column
name and some more information - what ever could be specified
using (ANSI) SQL to create tables).

Could it makes sense to have a DBD::File supported way to store
and load this meta-data (serialized, of course)?

I would really like to do this - this would bring us a big step
in the right direction.

DBD::DBM could store it in it's meta-data (instead of saving column
names it could safe the entire table_defs structure), but what should
DBD::CSV do?

Best regards,
Jens