Re: relational data models and Perl 6

2005-12-19 Thread Dave Rolsky

On Thu, 15 Dec 2005, Luke Palmer wrote:


On 12/15/05, Darren Duncan [EMAIL PROTECTED] wrote:

I propose, perhaps redundantly, that Perl 6 include a complete set of
native


Okay, I'm with you here.  Just please stop saying native and core.
Everyone.


Here, here.


I would like to hear from Ovid and Dave Rolsky on this issue too, as
they seem to have been researching pure relational models.


My take on this is that with all the features that are supposed to be in 
Perl6, implementing a declarative mini-language for relations and tuples 
would be a simple matter of programming ;)


The following bits seem to me to be key to making this work:

* built-in set operators, which can presumably be overloaded
* powerful OO system
* optional strong typing
* macros and grammar engine

So we can do stuff like this:

 my $emp_rel = Relation.new_from_dbms('Employee');
 my $dep_rel = Relation.new_from_dbms('Department');

 my $emp_dep_rel = $emp_rel natural join $dep_rel; # natural join is a macro of 
some sort

 my @array = $emp_dep_rel.sorted_tuples($emp_rel.attribute('name'));

This is a kind of bastardized semi-logical semi-OO language, but 
presumably we could do this instead


  my $relation = relational($dbh, EOF);
( ( Employee RENAME name AS employee_name ) NATURAL JOIN ( Department 
RENAME name AS department_name ) )
WHERE X.department_name = 'Management'
  EOF

And then presumably the Perl-side of the implementation will have some 
sort of object that does relational things, like:


  my $iterator = $relation.iterator( sort = 'employee_name' );

In fact, there's really no reason this couldn't be done in Perl5 today. 
The hard part is gluing the Perl bits to a DBMS, or alternatively 
implementing all of the relational logic (and therefore an RDBMS) in Perl.


Given that there is no truly relational DBMS available, I'm not sure 
exactly what Perl6 can do here, as the hard part is implementing such a 
that TRDBMS.  Of course, you could implement one client-side on top of a 
SQL DBMS.  It's just a simple matter of programming ;)



-dave

/*===
VegGuide.Orgwww.BookIRead.com
Your guide to all that's veg.   My book blog
===*/


Re: relational data models and Perl 6

2005-12-18 Thread Piers Cawley
Rob Kinyon [EMAIL PROTECTED] writes:

 On 12/16/05, Ovid [EMAIL PROTECTED] wrote:
 Minor nit:  we're discussing to the relational algebra and not the
 relational Calculus (unless the topic changed and I wasn't paying
 attention.  I wouldn't be surprised :)

 Algebra, in general, is a specific form of calculus. So, we're
 speaking of the same thing, just in different terms.

Umm... I think you have that relationship the wrong way around.

-- 
Piers Cawley [EMAIL PROTECTED]
http://www.bofh.org.uk/


Re: relational data models and Perl 6

2005-12-16 Thread Ovid
--- Rob Kinyon [EMAIL PROTECTED] wrote:

 As for the syntactic sugar, I'm not quite sure what should be
 done here. And, with macros, it's not clear that there needs
 to be an authoritative answer. Personally, I'd simply overload
 + for union, - for difference, * for cross-product, / for 
 divide, and so forth.

Bear with me for just a moment here while I provide some background. 
I'll eventually touch on Rob's topic.

One of the issues with handling relations correctly in databases is the
following:

  SELECT emp.name, cust.balance
  FROM   emp, cust
  WHERE  emp.id = cust.age

That's perfectly valid SQL but it doesn't make a lick of sense.  In the
original relational model, that would not be a valid query because the
emp.id would be a different type from the cust.age.  Operations between
different types are simply not allowed.  

However, sometimes it makes sense to allow those operations, though. 
For example, if cust.id and emp.id are different types but may share
identical and meaningful integer values, you might want to compare
those even though you can't.  So every type must have selectors which
behave more or less like we think of when we try to cast a variable to
a different type. 

So what if, for some crazy reason, we really did want to compare emp.id
to cust.age.  If cust.age is an integer, we might have something like
this pseudo-code:

  WHERE emp.id = EMP_ID(cust.age)

And that makes it all valid.  However, getting this far suggests an
interesting question.  What does the following mean?

  emp1.id + emp2.id

That means absolutely nothing but that's OK in the relational model
because it won't compile.  Why?  Because for any data type you wish to
have, you must define the following:

* The domain of acceptable values (potentially infinite)
* Selectors to cast to and from the value
* Operators and their behaviors

In short, if you don't have a '+' operator defined for a given data
type, you don't have to worry about non-sensical behaviors like the
above.  (Yahoo!'s spell checker tried to change that to non-sensual
behaviors.  I have no further comment.)

Needless to say, in order to properly apply the relational model, we
wind up with mandatory strong typing and this takes us very far afield
from Perl.  If we skip the strong typing, we may still have something
good, but it won't be the relational model.

Of course, all of this would put us on the doorstep of logic
programming but, if I recall correctly, a decision was already made
that Perl 6 wouldn't be delayed for its inclusion.  A sad, but
necessary choice.

Cheers,
Ovid

-- 
If this message is a response to a question on a mailing list, please send
follow up questions to the list.

Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/


Re: relational data models and Perl 6

2005-12-16 Thread Rob Kinyon
On 12/16/05, Ovid [EMAIL PROTECTED] wrote:
 --- Rob Kinyon [EMAIL PROTECTED] wrote:

  As for the syntactic sugar, I'm not quite sure what should be
  done here. And, with macros, it's not clear that there needs
  to be an authoritative answer. Personally, I'd simply overload
  + for union, - for difference, * for cross-product, / for
  divide, and so forth.

 Bear with me for just a moment here while I provide some background.
 I'll eventually touch on Rob's topic.

 One of the issues with handling relations correctly in databases is the
 following:

   SELECT emp.name, cust.balance
   FROM   emp, cust
   WHERE  emp.id = cust.age

 That's perfectly valid SQL but it doesn't make a lick of sense.  In the
 original relational model, that would not be a valid query because the
 emp.id would be a different type from the cust.age.  Operations between
 different types are simply not allowed.

 However, sometimes it makes sense to allow those operations, though.
 For example, if cust.id and emp.id are different types but may share
 identical and meaningful integer values, you might want to compare
 those even though you can't.  So every type must have selectors which
 behave more or less like we think of when we try to cast a variable to
 a different type.

 So what if, for some crazy reason, we really did want to compare emp.id
 to cust.age.  If cust.age is an integer, we might have something like
 this pseudo-code:

   WHERE emp.id = EMP_ID(cust.age)

I'm going to interject here with the following:
* P6 has the capability to be optionally strongly-typed. This is
obvious from the type signatures, if nothing else.
* According to the latest metamodel, as I understand it from
Stevan, types are, essentially, classes. This implies that I can
create my own types that inherit from some base type.

Overriding the operators in a generic way so that you have to have an
exact type match before you compare values also, imho, shouldn't be
that hard. So, for the relational calculus, you can have very strong
typing.

 * The domain of acceptable values (potentially infinite)
 * Selectors to cast to and from the value
 * Operators and their behaviors

I would argue that you don't have selectors, by default. You should
have to explicitly add a selector. Otherwise, into C-land you will go,
my son!

 Needless to say, in order to properly apply the relational model, we
 wind up with mandatory strong typing and this takes us very far afield
 from Perl.  If we skip the strong typing, we may still have something
 good, but it won't be the relational model.

If you end up in the relational section, which will be invoked with a
module, then you are choosing to use the typing that is available
through that module. I don't think anyone has argued that the
relational model should be built into the core or should even be a
module included in the core.

In fact, I see at least three modules coming out of this -
Type::Create, Type::Strengthen, and Model::Relational. Of these, I
would think only the Type:: modules should even have a chance to be in
the core distro. Type::Create may be a consequence of the metamodel,
but I'll let Steve or Audrey field that one.

Rob


Re: relational data models and Perl 6

2005-12-16 Thread Ovid
I agree with just about everything you wrote.  I only have two minor
quibbles and they may merely be restatements of what you meant.

--- Rob Kinyon [EMAIL PROTECTED] wrote:

 Overriding the operators in a generic way so that you have
 to have an exact type match before you compare values also, 
 imho, shouldn't be that hard. So, for the relational calculus,
 you can have very strong typing.

Minor nit:  we're discussing to the relational algebra and not the
relational Calculus (unless the topic changed and I wasn't paying
attention.  I wouldn't be surprised :)

  * The domain of acceptable values (potentially infinite)
  * Selectors to cast to and from the value
  * Operators and their behaviors
 
 I would argue that you don't have selectors, by default. You
 should have to explicitly add a selector. Otherwise, into
 C-land you will go, my son!

I'm not entirely sure, but I think we agree here.  You have to have, at
minimum, one selector for each new datatype if for no other reason than
to cast a string to your new data type.  Otherwise, your data types
would only be constants because you would have no way of assigning a
value.

Cheers,
Ovid

-- 
If this message is a response to a question on a mailing list, please send
follow up questions to the list.

Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/


Re: relational data models and Perl 6

2005-12-16 Thread Rob Kinyon
On 12/16/05, Ovid [EMAIL PROTECTED] wrote:
 Minor nit:  we're discussing to the relational algebra and not the
 relational Calculus (unless the topic changed and I wasn't paying
 attention.  I wouldn't be surprised :)

Algebra, in general, is a specific form of calculus. So, we're
speaking of the same thing, just in different terms.

   * The domain of acceptable values (potentially infinite)
   * Selectors to cast to and from the value
   * Operators and their behaviors
 
  I would argue that you don't have selectors, by default. You
  should have to explicitly add a selector. Otherwise, into
  C-land you will go, my son!

 I'm not entirely sure, but I think we agree here.  You have to have, at
 minimum, one selector for each new datatype if for no other reason than
 to cast a string to your new data type.  Otherwise, your data types
 would only be constants because you would have no way of assigning a
 value.

Fair enough. One would need to be able to convert back and forth
between the base type (Int, String, etc) and the type.

Rob


Re: relational data models and Perl 6

2005-12-15 Thread Darren Duncan

At 2:54 AM + 12/15/05, Luke Palmer wrote:

On 12/15/05, Darren Duncan [EMAIL PROTECTED] wrote:

 I propose, perhaps redundantly, that Perl 6 include a complete set of
 native


Okay, I'm with you here.  Just please stop saying native and core.
 Everyone.


Yes, of course.  What I meant was that I considered relational data 
important enough for common programming to be considered by the Perl 
6 language designers, so that the language allows for it to be 
elegantly represented and processed.  The implementation details 
aren't that important.



I would like to hear from Ovid and Dave Rolsky on this issue too, as
they seem to have been researching pure relational models.


As am I now.  My own database access framework in development is 
evolving to be centered more around an ideal relational model rather 
than simply what SQL or existing databases define.  It does any 
serious database developer good to be familiar with what the 
relational model actually says, and not just what tangential things 
have actually been implemented by various vendors.  The sources I 
cited are good reference and/or explanatory materials.



  Essentially it comes down to better handling of data sets.

Cool.  I've recently been taken by list comprehensions, and I keep
seeing set comprehensions in my math classes.  Maybe we can steal
some similar notation.


You probably could; the terms used in relational theory are mostly or 
entirely from mathematics.  (I could stand to learn more about those 
maths too.)



Hmm.  I would say it's a hash not so much.  For instance, the
difference between an array and a tuple in many languages is that an
array is homogeneously-typed--that's what allows you to access it
using runtime values (integers).  Tuples are heterogeneously-typed, so
you can't say

my $idx = get_input();
say $tuple[$idx];

(Pretend that Perl 6 is some other language :-), because the compiler
can't know what type it's going to say.

In the same way, I see a hash as homogeneously-typed, because you can
index it by strings.  What you're referring to as a tuple here would
be called a record or a struct in most languages.


Yes, you are right; a Tuple is very much a record or a struct; I 
just didn't use those because Perl doesn't have them per se; the 
closest thing that Perl has is the object, which you could say is 
exactly equivalent.



* a Relation is an unordered set of Tuples, where every Tuple has

 the same definition, as if the Relation were akin to a specific Perl
 class and every Tuple in it were akin to a Perl object of that class


When you say unordered set (redundantly, of course), can this set be
infinite?  That is, can I consider this relation (using made-up set
comprehension notation):

{ ($x,$y) where $x  $y (in) Int, $x = $y }

And do stuff with it?


Yes you can.  A set can be infinite.  For example, the set of INTEGER 
contains every whole number from negative infinity to positive 
infinity.  At the same time, this set excludes all fractional numbers 
and all data that is not a number, such as characters.  This only 
becomes finite when you place bounds on the range, such as saying it 
has to be between +/- 2 billion.



  Specifically what I would like to see added to Perl, if that doesn't

 already exist, is a set of operators that work on Relations, like set
 operations, such as these (these bulleted definitions from Database
 in Depth, 1.3.3, some context excluded):

   * Restrict - Returns a relation containing all tuples from a
 specified relation that satisfy a specified condition. For example,
 we might restrict relation EMP to just the tuples where the DNO value
 is D2.


Well, if we consider a relation to be a set, then we can use the set 
operations:


my $newrel = $emp.grep: { .DNO === 'D2' };

I don't know what EMP, DNO, and D2 are...


Part of the context I excluded before, from section 1.3.1, is that 
the author is talking about hypothetical DEPT (Department) and EMP 
(Employee) relations (tables); DEPT has the attributes [DNO, DNAME, 
BUDGET], and EMP has the attributes [ENO, ENAME, DNO, SALARY]; 
DEPT.DNO is referenced by EMP.DNO; DEPT.DNO and EMP.ENO are primary 
keys in their respective relations.


So the restrict example is like, as you said, but with EMP an object:

  my $NEWREL = $EMP.grep:{ $.DNO eq 'D2' };

A SQLish equivalent would be:

  INSERT INTO NEWREL SELECT FROM EMP WHERE DNO = 'D2';


* Project - Returns a relation containing all (sub)tuples that

 remain in a specified relation after specified attributes have been
 removed. For example, we might project relation EMP on just the ENO
 and SALARY attributes.


Hmm...  Well, if we pretend that records and hashes are the same thing
for the moment, then:

my $newrel = $emp.map: { .:ENO SALARY };

(See the new S06 for a description of the .: syntax)


Or with EMP an object:

  my $NEWREL = $EMP.map:{ $_.class.new( ENO = $_.ENO, SALARY = $.SALARY ) };

SQLish:

  INSERT INTO NEWREL (ENO, 

Re: relational data models and Perl 6

2005-12-15 Thread Darren Duncan

As an addendum to what I said before ...

The general kind of thing I am proposing for Perl 6 to have is a 
declarative syntax for more kinds of tasks, where you can simply 
specify *what* you want to happen, and you don't have to tell Perl 
how to perform that task.


An example of declaratives that is already specified is 
hyper-operators; you don't have to tell Perl how to iterate through 
various lists or divide up tasks.


I would want the set operations for tuples to be like that, but the 
example code that Luke and I expressed already, with maps and greps 
etc, seems to smack too much of telling Perl how to do the job.


I don't want to have to use maps or greps or whatever, to express the 
various relational operations.


-- Darren Duncan


Re: relational data models and Perl 6

2005-12-15 Thread Xavier Noria

On Dec 15, 2005, at 2:19, Darren Duncan wrote:

 * a Tuple is an associative array having one or more Attributes,  
and each Attribute has a name or ordinal position and it is typed  
according to a Domain;
this is like a restricted Hash in a way, where each key has a  
specific type


 * a Relation is an unordered set of Tuples, where every Tuple has  
the same definition, as if the Relation were akin to a specific  
Perl class and every Tuple in it were akin to a Perl object of that  
class


Something that puzzled me in Database in Depth is that jargon,  
supposedly math-based. A relation in math is just a subset of a  
Cartesian product, and a tuple is an element of a relation. So it's  
standard for a Relation type to be a set of Tuples, but a tuple  
itself is not a set (as are tuples in the book, argh). So if  
something unordered like that goes into the language to mimick that  
model I wouldn't call it Tuple.


Math conventions there are well established, the jargon in Database  
in Depth departs from them and I don't think it is a good idea to  
adopt it.


-- fxn



Re: relational data models and Perl 6

2005-12-15 Thread Ruud H.G. van Tol
Darren Duncan schreef:

 If you take ...

   +-+-+
   |a|x|
   |a|y|
   |a|z|
   |b|x|
   |c|y|
   +-+-+

 ... and divide it by ...

   +-+
   |x|
   |z|
   +-+

 ... the result is ...

   +-+
   |a|
   +-+

 I'm not sure if Divide has an equivalent in SQL.


A verbose way to do it:

SELECTC_abc
FROM  T_abc_xyz NATURAL INNER JOIN T_xz
GROUP BY  C_abc
HAVINGCount(T_abc_xyz.C_xyz)
 =(SELECT Count(*) FROM T_xz);

This basically filters the INNER JOIN result-set to only keep those
subsets that have the required number of rows.

It requires that the rows of each table are unique, so there can not be
another (b,x) in T_abc_xyz.
That is a normal requirement.

-- 
Grtz, Ruud



Re: relational data models and Perl 6

2005-12-15 Thread Dave Whipp

Darren Duncan wrote:

As an addendum to what I said before ...

...
I would want the set operations for tuples to be like that, but the 
example code that Luke and I expressed already, with maps and greps etc, 
seems to smack too much of telling Perl how to do the job.


I don't want to have to use maps or greps or whatever, to express the 
various relational operations.


I think you're reading too many semantics into Cmap and Cgrep: they 
don't tell perl *how* to implement the search, any more than 
Csql:where would. The example was:


  INSERT INTO NEWREL SELECT FROM EMP WHERE DNO = 'D2';
Vs
  my $NEWREL = $EMP.grep:{ $.DNO eq 'D2' };

The implementation of $EMP.grep depends very much on the class of $EMP. 
If this is an array-ref, then it is reasonable to think that the grep 
method would iterate the array in-order. However, if the class is 
unordered set, then there is no such expectation on the implementation.


The deeper problem is probably the use of the eq operator in the test. 
Without knowing a-priori what operations (greps) will be performed on 
the relation, it is not possible to optimize the data structure for 
those specific operations. For example, if we knew that $EMP should 
store its data based on the {$.DNO eq 'D2'} equivalence class then this 
grep would have high performance (possibly at the expense of its creation).


In theory, a sufficiently magical module could examine the parse tree 
(post type-inference), and find all the calls to Cgrep on everything 
that's a tuple -- and use that to attempt optimizations of a few special 
cases (e.g. a code block that contains just an eq test against an 
attribute). I'm not sure how practical this would be, but I don't see 
how a different syntax (e.g. s/grep/where/) would be more more 
declarative in a way that makes this task any easier.


Re: relational data models and Perl 6

2005-12-15 Thread Rob Kinyon
[snip entire conversation so far]

(Please bear with me - I'm going to go in random directions.)

Someone please correct me if I'm wrong, but it seems that there's only
a few things missing in P6:
1) An elegant way of creating a tuple-type (the table, so to speak)
2) A way of providing constraints across the actual tuples of a
given tuple-type
3) Syntactic sugar for performing the relational calculus

To me, a tuple-type is more than a class in the standard OO. It has to
be able to apply any constraints that might be upon the tuple-type,
such as uniqueness of a given element across all tuples or foreign-key
constraints. While this is certainly possible using the P6 OO
constructs, it would make sense for a baseclass to provide this
functionality.

Actually, this is a really great place for metaclasses to shine. The
actual tuple-type needs to be constructed from some class-constructor
(which would be, in the metamodel, itself a class). This is so that it
has the appropriate types for the elements of the tuple along with any
necessary constraints upon the tuples / elements of the tuples.

In addition, you're going to want to take actions not just on the
tuple, but on the entire tuple-type. That screams class-level methods
that operate across all instances of the class. Maybe, a set of roles
would be good for organizing this kind of across-all-instances
behavior that the tuple-type can take advantage of. I'm sure that this
wouldn't be limited to just the relational calculus.

As for the syntactic sugar, I'm not quite sure what should be done
here. And, with macros, it's not clear that there needs to be an
authoritative answer. Personally, I'd simply overload + for union, -
for difference, * for cross-product, / for divide, and so forth.
There's been some discussion with sets as to creating new operators
using the set-operators that come in Unicode. As tuples and relations
among tuples aren't necessarily sets, those might not be appropriate.

It also seems clear that junctionish iterators may be of use here. For
example, Give me all the tuples that match this criteria might
return an iterator that also acts as an any-junction.

It could also return a class object that has a different set of
instances marked as created from it. Though, I'm not too sure how that
would work when asking a given instance who is the class object that
created you? ... maybe it returns the initial one or maybe it returns
them all? I think the initial one is more correct, as the others are
just subsets. When dealing with SQL, I don't care about the subsets
that a given row belongs to - I only care about the table. So, maybe
the subset class objects delegate all methods to the original class
object except for those that deal with Who do you have? and Give me
a subset where ...

Also, joins between tuple-types would have to create a new tuple-type,
with the tuples within being delegators to the underlying tuples? I'm
not sure that this (or any other) derived tuple-type class object
should be allowed to create new tuples (though I'm sure someone can
think of a good reason why I'm wrong).

Again, just a bunch of meandering thoughts. Bonus points to whomever
can help me bridge the gap between what I just blathered and an
elegant solution to Sudoku.

Rob


Re: relational data models and Perl 6

2005-12-15 Thread Dr.Ruud
Ruud H.G. van Tol schreef:

 [RD-interface]

See also these Haskell Hierarchical Libraries (base package)
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Set.html
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Map.html

-- 
Affijn, Ruud

Gewoon is een tijger.




relational data models and Perl 6

2005-12-14 Thread Darren Duncan

All,

P.S. What follows is rough and will be smoothed out or reworked.

I propose, perhaps redundantly, that Perl 6 include a complete set of 
native language constructs for a relational data model, akin to that 
introduced in E. F. Codd's classic paper, A Relational Model of Data 
for Large Shared Data Banks (a copy of which is at 
http://www.acm.org/classics/nov95/toc.html ), and also discussed at 
length in such books as C. J. Date's Database in Depth (O'Reilly, 
2005).  Codd's paper itself (see 1.5) says that the necessary pieces 
are good candidates for a sub-language of any typical programming 
language.


The actual relational data model (which is not the same as SQL per 
se) is expressable in terms of mathematics, such as sets and 
predicate calculus, and therefore I believe that Perl 6 already has 
most of what is needed in the language already.


Essentially it comes down to better handling of data sets.

It is very possible, then that all which may be necessary is an 
extension of the standard data types, or operators, or builtin 
functions, and/or utilization of the Perl 6 object model.


What I would like, for example, are standard data types which are 
akin to Relations/RelVars/etc (tables/rowsets), Tuples (rows), 
Attributes (fields), Sets (enums), Domains (data types) and such. 
Largely these already map to existing Perl 6 entities:


 * a Domain is like a class that defines a set of possible values, 
and each value can be multi-part; equal to a perl Class


 * an Attribute stores a value which is a perl Object

 * a Tuple is an associative array having one or more Attributes, and 
each Attribute has a name or ordinal position and it is typed 
according to a Domain;

this is like a restricted Hash in a way, where each key has a specific type

 * a Relation is an unordered set of Tuples, where every Tuple has 
the same definition, as if the Relation were akin to a specific Perl 
class and every Tuple in it were akin to a Perl object of that class


Fairly standard so far.

Specifically what I would like to see added to Perl, if that doesn't 
already exist, is a set of operators that work on Relations, like set 
operations, such as these (these bulleted definitions from Database 
in Depth, 1.3.3, some context excluded):


 * Restrict - Returns a relation containing all tuples from a 
specified relation that satisfy a specified condition. For example, 
we might restrict relation EMP to just the tuples where the DNO value 
is D2.


 * Project - Returns a relation containing all (sub)tuples that 
remain in a specified relation after specified attributes have been 
removed. For example, we might project relation EMP on just the ENO 
and SALARY attributes.


 * Product - Returns a relation containing all possible tuples that 
are a combination of two tuples, one from each of two specified 
relations. Product is also known variously as cartesian product, 
cross product, cross join, and cartesian join (in fact, itis just a 
special case of join, as we'll see in Chapter 5).


 * Intersect - Returns a relation containing all tuples that appear 
in both of two specified relations. (Actually, intersect also is a 
special case of join.)


 * Union - Returns a relation containing all tuples that appear in 
either or both of two specified relations.


 * Difference - Returns a relation containing all tuples that appear 
in the first and not the second of two specified relations.


 * Join - Returns a relation containing all possible tuples that are 
a combination of two tuples, one from each of two specified 
relations, such that the two tuples contributing to any given result 
tuple have a common value for the common attributes of the two 
relations (and that common value appears just once, not twice, in 
that result tuple).  NOTE, This kind of join was originally called 
the natural join. Since natural join is far and away the most 
important kind, however, it's become standard practice to take the 
unqualified term join to mean the natural join specifically, and I'll 
follow that practice in this book.


 * Divide - Takes two relations, one binary and one unary, and 
returns a relation consisting of all values of one attribute of the 
binary relation that match (in the other attribute) all values in the 
unary relation.


Now, all that I'm saying, could be implemented as a Perl 6 module, 
and if necessary I can do this for illustrative purposes, but I 
believe that this is essentially simple and something analagous 
should be included in the core language for similar reasons that 
junctions and PDL are.


I also want to make clear that this functionality is entirely about 
better support for data processing with Perl native variables, and 
has nothing to do with external data repositores such as SQL 
databases.  Though I anticipate that one could extend or override 
built-ins so that they interact with remote databases instead of 
internal variables, such as with the concept of sub-classing or role 

Re: relational data models and Perl 6

2005-12-14 Thread Luke Palmer
On 12/15/05, Darren Duncan [EMAIL PROTECTED] wrote:
 I propose, perhaps redundantly, that Perl 6 include a complete set of
 native

Okay, I'm with you here.  Just please stop saying native and core.
 Everyone.

rant
Remember, syntax in Perl 6 can be stuffed in a library like anything
else.  You don't get much out of making it truly core (that is, in
the language without any use statements) besides the fact that you tie
yourself to all the idiosyncrasies of a single implementation instead
of allowing multiple.

Let's talk about what the module would look like, and then in a
different discussion talk about which modules are used by default.
/rant

 language constructs for a relational data model, akin to that
 introduced in E. F. Codd's classic paper, A Relational Model of Data
 for Large Shared Data Banks (a copy of which is at
 http://www.acm.org/classics/nov95/toc.html ), and also discussed at
 length in such books as C. J. Date's Database in Depth (O'Reilly,
 2005).  Codd's paper itself (see 1.5) says that the necessary pieces
 are good candidates for a sub-language of any typical programming
 language.

I would like to hear from Ovid and Dave Rolsky on this issue too, as
they seem to have been researching pure relational models.

 Essentially it comes down to better handling of data sets.

Cool.  I've recently been taken by list comprehensions, and I keep
seeing set comprehensions in my math classes.  Maybe we can steal
some similar notation.

 What I would like, for example, are standard data types which are
 akin to Relations/RelVars/etc (tables/rowsets), Tuples (rows),
 Attributes (fields), Sets (enums), Domains (data types) and such.
 Largely these already map to existing Perl 6 entities:

   * a Domain is like a class that defines a set of possible values,
 and each value can be multi-part; equal to a perl Class

   * an Attribute stores a value which is a perl Object

   * a Tuple is an associative array having one or more Attributes, and
 each Attribute has a name or ordinal position and it is typed
 according to a Domain;
 this is like a restricted Hash in a way, where each key has a specific type

Hmm.  I would say it's a hash not so much.  For instance, the
difference between an array and a tuple in many languages is that an
array is homogeneously-typed--that's what allows you to access it
using runtime values (integers).  Tuples are heterogeneously-typed, so
you can't say

my $idx = get_input();
say $tuple[$idx];

(Pretend that Perl 6 is some other language :-), because the compiler
can't know what type it's going to say.

In the same way, I see a hash as homogeneously-typed, because you can
index it by strings.  What you're referring to as a tuple here would
be called a record or a struct in most languages.

   * a Relation is an unordered set of Tuples, where every Tuple has
 the same definition, as if the Relation were akin to a specific Perl
 class and every Tuple in it were akin to a Perl object of that class

When you say unordered set (redundantly, of course), can this set be
infinite?  That is, can I consider this relation (using made-up set
comprehension notation):

{ ($x,$y) where $x  $y (in) Int, $x = $y }

And do stuff with it?

 Specifically what I would like to see added to Perl, if that doesn't
 already exist, is a set of operators that work on Relations, like set
 operations, such as these (these bulleted definitions from Database
 in Depth, 1.3.3, some context excluded):

   * Restrict - Returns a relation containing all tuples from a
 specified relation that satisfy a specified condition. For example,
 we might restrict relation EMP to just the tuples where the DNO value
 is D2.

Well, if we consider a relation to be a set, then we can use the set operations:

my $newrel = $emp.grep: { .DNO === 'D2' };

I don't know what EMP, DNO, and D2 are...

   * Project - Returns a relation containing all (sub)tuples that
 remain in a specified relation after specified attributes have been
 removed. For example, we might project relation EMP on just the ENO
 and SALARY attributes.

Hmm...  Well, if we pretend that records and hashes are the same thing
for the moment, then:

my $newrel = $emp.map: { .:ENO SALARY };

(See the new S06 for a description of the .: syntax)

   * Union - Returns a relation containing all tuples that appear in
 either or both of two specified relations.

Already have it:

$rel1 (+) $rel2

   * Difference - Returns a relation containing all tuples that appear
 in the first and not the second of two specified relations.

Already have it:

$rel1 (-) $rel2

   * Join - Returns a relation containing all possible tuples that are
 a combination of two tuples, one from each of two specified
 relations, such that the two tuples contributing to any given result
 tuple have a common value for the common attributes of the two
 relations (and that common value appears just once, not twice, in
 that result tuple).  NOTE, This kind of join was