Re: New DBD::File feature - can we do more with it?
H.Merijn Brand wrote: A bit more thought leads to better portability ... my $hash = { meta => { f_dir => "data", f_schema=> undef, f_ext => ".csv/r", f_encoding => "utf8", f_lock => 1, }, Something very important here, arguably the most important thing to get right at the start, is that the API has to be versioned. We have to start off with a ridiculously simple metaformat, such as simply a 2-element array. The first element of this array declares the version of the format of the metadata, where that is provided as the second element. For example: my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { meta => [['DBI','http://dbi.perl.org/','0.001'],{ ... }], }) or die DBI->errstr; Or if that's too complicated, then: my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { meta => ['0.001',{ ... }], }) or die DBI->errstr; The format here is based on that used by Perl 6 for fully-qualified Perl or module names, with parts [base-name, authority, vnum]. Something similar is also mandated by Muldis D for starting off code written in it. For that matter, look at the META.yml files bundled with Perl modules these days, which declare the version of the spec they adhere to. The point is that we will give ourselves a lot of room to evolve cleanly if we start using declared versions in the beginning, so that any subsequent version of DBI can interpret the user arguments unambiguously to their intent, because the users are telling them, the input is in this version of the format. The second element can even change over time, say from a hash-ref to an array-ref or vice-versa, depending what the first element is. So I strongly suggest starting from the point of mandating declared versions and go from there. -- Darren Duncan
Re: New DBD::File feature - can we do more with it?
On Wed, 30 Jun 2010 08:37:33 +0200, "H.Merijn Brand" wrote: > On Tue, 29 Jun 2010 19:12:19 +, Jens Rehsack > wrote: > > > Hi, > > > > because of a bogus implementation for PRECISION, NULLABLE etc. > > attributes in DBD::CSV (forced by limitations of SQL::Statement) > > we have a new attribute for tables in DBD::File meta data: > > 'table_defs'. This is filled when a 'CREATE TABLE ...' is executed > > and copies the $stmt->{table_defs} structure (containing column > > name and some more information - what ever could be specified > > using (ANSI) SQL to create tables). > > > > Could it makes sense to have a DBD::File supported way to store > > and load this meta-data (serialized, of course)? > > I'm all in favor of saving data-dictionary info in some persistent > way, but as Tim said, not by default. > > It should be a user option, and the interface should be configurable > DBD::File should support what is installed, but only *if* it is > installed and available. I personally would prefer JSON, with YAML > on second position both very well fit the bill for DBD dict storage > YAML is available for CPAN > > my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { > f_dir => "data", > f_ext => ".csv/r", > f_encoding => "utf8", > f_schema => undef, > f_dict => $dict, > }) or die DBI->errstr; > > where $dict is > 1. A hash see below > 2. A hashrefref to 1. > 3. A filename filename > 4. A listref[ filename, storage type ] > > The hash/ref from the DDD can be read from or written to file in case > of 3 or 4. this way we are backward compatible and support more than > ever before. The content of the hash should be very well documented and > all errors in it should optionally be ignored. > > "storage type" can be any means of persistence: "JSON", "YAML", > "Storable", "Freeze", where "Storable" is the default as it is > available in CORE since ages. > > The hash could be something like ... > > my $hash = { > foo => [ > [ "bar",[ "numeric", 4 ], [ "not null", "primary key" ] ], > [ "baz",[ "integer"], ], > [ "drz",[ "char", 60 ], ], > [ "fbg",[ "numeric", 2, 2 ], [ ], [ "default", 0.0 ], ], > [ "c_base", [ "numeric", 4 ], [ "not null" ]], > ], > base => [ > [ "c_base", [ "numeric", 4 ], [ "not null", "primary key" ] ], > [ ... > : > "\cA:links" => [ > [ "foo.bar" => "base.c_base" ], > : A bit more thought leads to better portability ... my $hash = { meta => { f_dir => "data", f_schema=> undef, f_ext => ".csv/r", f_encoding => "utf8", f_lock => 1, }, tables => { foo => { meta => { file => "foo.csv", key => [ "c_foo" ], }, fields => [ [ c_foo => { NAME => "c_foo", NAME_lc=> "c_foo", NAME_uc=> "C_FOO", NULLABLE => 0, PRECISION => 2, SCALE => undef, TYPE => 5, TYPE_NAME => "NUMERIC" }], [ foo => { NAME => "foo", NAME_lc=> "foo", NAME_uc=> "FOO", NULLABLE => 1, PRECISION => 40, SCALE => undef, TYPE => 12, TYPE_NAME => "VARCHAR" }], : : ], }, : : }, links => { "bar.c_foo" => "foo.c_foo", : }, indices => { foo => [ [ "c_foo", "foo DESC" ], : ], : -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Re: New DBD::File feature - can we do more with it?
On 06/30/10 06:37, H.Merijn Brand wrote: On Tue, 29 Jun 2010 19:12:19 +, Jens Rehsack wrote: Hi, because of a bogus implementation for PRECISION, NULLABLE etc. attributes in DBD::CSV (forced by limitations of SQL::Statement) we have a new attribute for tables in DBD::File meta data: 'table_defs'. This is filled when a 'CREATE TABLE ...' is executed and copies the $stmt->{table_defs} structure (containing column name and some more information - what ever could be specified using (ANSI) SQL to create tables). Could it makes sense to have a DBD::File supported way to store and load this meta-data (serialized, of course)? I'm all in favor of saving data-dictionary info in some persistent way, but as Tim said, not by default. It should be a user option, and the interface should be configurable DBD::File should support what is installed, but only *if* it is installed and available. I personally would prefer JSON, with YAML on second position both very well fit the bill for DBD dict storage YAML is available for CPAN my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "data", f_ext => ".csv/r", f_encoding => "utf8", f_schema => undef, f_dict => $dict, }) or die DBI->errstr; where $dict is 1. A hash see below 2. A hashrefref to 1. 3. A filename filename 4. A listref[ filename, storage type ] The hash/ref from the DDD can be read from or written to file in case of 3 or 4. this way we are backward compatible and support more than ever before. The content of the hash should be very well documented and all errors in it should optionally be ignored. "storage type" can be any means of persistence: "JSON", "YAML", "Storable", "Freeze", where "Storable" is the default as it is available in CORE since ages. The hash could be something like ... my $hash = { foo => [ [ "bar",[ "numeric", 4 ], [ "not null", "primary key" ] ], [ "baz",[ "integer"], ], [ "drz",[ "char", 60 ], ], [ "fbg",[ "numeric", 2, 2 ], [ ], [ "default", 0.0 ], ], [ "c_base", [ "numeric", 4 ], [ "not null" ]], ], base => [ [ "c_base", [ "numeric", 4 ], [ "not null", "primary key" ] ], [ ... : "\cA:links" => [ [ "foo.bar" => "base.c_base" ], : that was just a braindump. I would really like to do this - this would bring us a big step in the right direction. DBD::DBM could store it in it's meta-data (instead of saving column names it could safe the entire table_defs structure), but what should DBD::CSV do? Best regards, Jens We talked in IRC about the above ... Summary: The main intention points to dictionary support for databases. This will require as well SQL::Statement 2 and increased feature realm in DBI::DBD::SqlEngine. Currently I'm unsure how we deal with DBI::SQL::Nano regarding that (separate mail). IRC backlog: <@[Tux]> I'm all in favor of saving data-dictionary info in some persistent way. <@[Tux]> But as timbo said, not by default <@[Tux]> it should be a user option, and the interface should be configurable <@[Tux]> DBD::File should support what is installed, but only *if* it is installed <@[Tux]> and available <@[Tux]> I personally would prefer JSON, with YAML on second position <@[Tux]> both very well fit the bill for DBD dict storage <@[Tux]> YAML is available for CPAN <@[Tux]> DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "data", f_dict => ... }); <@[Tux]> where ... is <@[Tux]> 1. A filename <@[Tux]> 2. A hash <@[Tux]> 3. A hashref <@[Tux]> The hash/ref from the DDD can be read from or written to file in case of 1. <@[Tux]> this way we are backward compatible and support more than ever before * You are now known as Sno| <@Sno|> timbo? Do you mean Darren? <@Sno|> [Tux]: your f_dict => parameter list doesn't fit requirements of DBD::DBM <@[Tux]> it was just a basic brain dump <@[Tux]> all "\cA:..." entries are non-tabel names in that example, but it might be better to change that to <@[Tux]> { tables => { foo => [ ... ], ... }, links => { ... }, indices => { ... }}; <@[Tux]> extend to fit requirement of whatevah <@Sno|> well, what's with the idea to add an optional second header line to a csv file? <@[Tux]> no <@Sno|> ok, this was clear :) <@[Tux]> that would mean it is not a CSV file anymore <@[Tux]> CSV's are MOSTLY from exports and for imports <@[Tux]> if you alter the content just for DBD, it would defeat all other uses <@Sno|> ok <@Sno|> than DBD::File should have at least 2 parameters: f_serializer and f_dict <@[Tux]> why serializer? <@Sno|> to choose JSON, YAML etc. <@[Tux]> see option 4 <@[Tux]> f_dict => [ "ddd.json", "JSON" ], <@Sno|> I'd prefer to handle those separate <@[Tux]> I don't <@Sno|> because of, DBD::DBM would not store that in a separate file <@[Tux]> f_dict =
Re: New DBD::File feature - can we do more with it?
On Tue, 29 Jun 2010 19:12:19 +, Jens Rehsack wrote: > Hi, > > because of a bogus implementation for PRECISION, NULLABLE etc. > attributes in DBD::CSV (forced by limitations of SQL::Statement) > we have a new attribute for tables in DBD::File meta data: > 'table_defs'. This is filled when a 'CREATE TABLE ...' is executed > and copies the $stmt->{table_defs} structure (containing column > name and some more information - what ever could be specified > using (ANSI) SQL to create tables). > > Could it makes sense to have a DBD::File supported way to store > and load this meta-data (serialized, of course)? I'm all in favor of saving data-dictionary info in some persistent way, but as Tim said, not by default. It should be a user option, and the interface should be configurable DBD::File should support what is installed, but only *if* it is installed and available. I personally would prefer JSON, with YAML on second position both very well fit the bill for DBD dict storage YAML is available for CPAN my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "data", f_ext => ".csv/r", f_encoding => "utf8", f_schema => undef, f_dict => $dict, }) or die DBI->errstr; where $dict is 1. A hash see below 2. A hashrefref to 1. 3. A filename filename 4. A listref[ filename, storage type ] The hash/ref from the DDD can be read from or written to file in case of 3 or 4. this way we are backward compatible and support more than ever before. The content of the hash should be very well documented and all errors in it should optionally be ignored. "storage type" can be any means of persistence: "JSON", "YAML", "Storable", "Freeze", where "Storable" is the default as it is available in CORE since ages. The hash could be something like ... my $hash = { foo => [ [ "bar",[ "numeric", 4 ], [ "not null", "primary key" ] ], [ "baz",[ "integer"], ], [ "drz",[ "char", 60 ], ], [ "fbg",[ "numeric", 2, 2 ], [ ], [ "default", 0.0 ], ], [ "c_base", [ "numeric", 4 ], [ "not null" ]], ], base => [ [ "c_base", [ "numeric", 4 ], [ "not null", "primary key" ] ], [ ... : "\cA:links" => [ [ "foo.bar" => "base.c_base" ], : that was just a braindump. > I would really like to do this - this would bring us a big step > in the right direction. > > DBD::DBM could store it in it's meta-data (instead of saving column > names it could safe the entire table_defs structure), but what should > DBD::CSV do? > > Best regards, > Jens -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using 5.00307 through 5.12 and porting perl5.13.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Re: New DBD::File feature - can we do more with it?
On 06/29/10 19:58, Darren Duncan wrote: Jens Rehsack wrote: because of a bogus implementation for PRECISION, NULLABLE etc. attributes in DBD::CSV (forced by limitations of SQL::Statement) we have a new attribute for tables in DBD::File meta data: 'table_defs'. This is filled when a 'CREATE TABLE ...' is executed and copies the $stmt->{table_defs} structure (containing column name and some more information - what ever could be specified using (ANSI) SQL to create tables). Could it makes sense to have a DBD::File supported way to store and load this meta-data (serialized, of course)? I would really like to do this - this would bring us a big step in the right direction. DBD::DBM could store it in it's meta-data (instead of saving column names it could safe the entire table_defs structure), but what should DBD::CSV do? You have several possible options and the short answer is "let the user tell you". This doesn't answer the real question: What's the best way to provide an implementation and how far should the DBD::File (part of DBI, remember) should be done? For backwards or sideways compatibility, don't store any extra metadata by default; the only metadata is what can be gleaned from the normal CSV file itself. Fair. Then, if the user tells you to via extra DBI config args, then retrieve or store extra metadata in accordance with those args. For example, the extra args may give you a second file name in which the meta data is / is to be stored. Or the extra args may indicate that one of the first rows in the CSV file contains metadata rather than normal data. For example, if all of your metadata is column-specific, the CSV file could contain 2 heading rows instead of the usual one (or none)? The first heading row would be the name of the column as usual. The new second heading row would be the encoded type/constraint/etc definition that says how to interpret the data of the column, and then the third-plus columns are the data. These are both interesting points - but I'm unsure if FreezeThaw can be used for serialization for the 2nd option - JSON looks like a better candidate, but is not in core. But the point is, let the user tell you how to interpret the particular CSV files they throw at you, and so DBD::CSV is then more flexible and compatible. User can only choose provided choices ;) What is really interesting to me is to discuss the different choices, how they impact the different modules DBD::File, DBD::CSV, DBD::DBM, DBD::PO, later DBD::AnyData and what might come, too. Currently - especially for DBD::AnyData, DBI::DBD::SqlEngine is limited. The $m->{table_defs} support belongs to there - as well as some other table meta data handling. Maybe a kind of table API extension (optionally, of course) would be a nice idea to discuss, too. Jens
Re: New DBD::File feature - can we do more with it?
Jens Rehsack wrote: because of a bogus implementation for PRECISION, NULLABLE etc. attributes in DBD::CSV (forced by limitations of SQL::Statement) we have a new attribute for tables in DBD::File meta data: 'table_defs'. This is filled when a 'CREATE TABLE ...' is executed and copies the $stmt->{table_defs} structure (containing column name and some more information - what ever could be specified using (ANSI) SQL to create tables). Could it makes sense to have a DBD::File supported way to store and load this meta-data (serialized, of course)? I would really like to do this - this would bring us a big step in the right direction. DBD::DBM could store it in it's meta-data (instead of saving column names it could safe the entire table_defs structure), but what should DBD::CSV do? You have several possible options and the short answer is "let the user tell you". For backwards or sideways compatibility, don't store any extra metadata by default; the only metadata is what can be gleaned from the normal CSV file itself. Then, if the user tells you to via extra DBI config args, then retrieve or store extra metadata in accordance with those args. For example, the extra args may give you a second file name in which the meta data is / is to be stored. Or the extra args may indicate that one of the first rows in the CSV file contains metadata rather than normal data. For example, if all of your metadata is column-specific, the CSV file could contain 2 heading rows instead of the usual one (or none)? The first heading row would be the name of the column as usual. The new second heading row would be the encoded type/constraint/etc definition that says how to interpret the data of the column, and then the third-plus columns are the data. But the point is, let the user tell you how to interpret the particular CSV files they throw at you, and so DBD::CSV is then more flexible and compatible. I will be taking a similar approach in the near future when implementing my Muldis D language over legacy systems such as SQL databases, whose metadata generally aren't as rich as mine. Over a SQL database, I would provide at least 3 options to users that they specify when connecting to a SQL database using Muldis D: 1. There is no extra metadata and just reverse-engineer the SQL metadata; this is also the degrade gracefully approach. 2. The extra metadata is stored in tables of another SQL schema of the same database and the user names that schema when connecting; this way, users can name their own stuff anything they want and they pick where my metadata goes, out of their way. 3. The extra metadata is stored outside the database in a local file; this file is named by the user when connecting. I think that #2 is the best option for that. -- Darren Duncan
New DBD::File feature - can we do more with it?
Hi, because of a bogus implementation for PRECISION, NULLABLE etc. attributes in DBD::CSV (forced by limitations of SQL::Statement) we have a new attribute for tables in DBD::File meta data: 'table_defs'. This is filled when a 'CREATE TABLE ...' is executed and copies the $stmt->{table_defs} structure (containing column name and some more information - what ever could be specified using (ANSI) SQL to create tables). Could it makes sense to have a DBD::File supported way to store and load this meta-data (serialized, of course)? I would really like to do this - this would bring us a big step in the right direction. DBD::DBM could store it in it's meta-data (instead of saving column names it could safe the entire table_defs structure), but what should DBD::CSV do? Best regards, Jens