RE: Database interface - would like advice on oracle library binding

2003-09-25 Thread oleg

The following code illustrates a _generic_ interface to low-level
database code. The left-fold iterator doQuery is completely generic
over any possible iterator -- no matter how many columns the query
returns, what are the types of these columns and what is the type of
the seed (accumulator). The code for doQuery remains the same. The
iterator allocates buffers for columns at the beginning and frees the
buffers at the very end. Again, this buffer handling is generic. There
is no longer need to write extraction/unmarshalling function for
specific types of rows. We only need fetching functions for specific
datatypes (not columns!). Again, the query and the row buffer
management code is completely generic. I guess I'm repeating
myself. The tests:

-- Query returns one column of type String
-- Never mind undefined: we return some static data in the buffers,
-- we don't have any oracle to bind to
test1 = doQuery undefined undefined iter1 ([]::[String])
  where
 iter1:: String - [String] - Either [String] [String]
 iter1 s acc = Right $ s:acc 


-- Query returns two columns of types String and Int
test2 = doQuery undefined undefined iter2 ([]::[(String,Int)])
  where
 iter2:: String - Int - [(String,Int)] - 
 Either [(String,Int)] [(String,Int)]
 iter2 s i acc = Right $ (s,i):acc 


-- Query returns three columns of types Int, String and Int
test3 = doQuery undefined undefined iter3 ([]::[(Int,String,Int)])
  where
 iter3:: Int - String - Int - [(Int,String,Int)] - 
 Either [(Int,String,Int)] [(Int,String,Int)]
 iter3 i1 s i2 acc = Right $ (i1,s,i2):acc 

Use the function runtests to run either of these tests.


The code follows. Compiler flags: 
-fglasgow-exts -fallow-overlapping-instances

-- DB column buffers

type BufferSize = Int
data BufferType = ORA_char | ORA_int
type Position = Int  -- column number of the result table

data Buffer = Buffer { bufptr :: String -- for this stub, just use String
 , nullindptr :: String -- likewise
 , retsizeptr :: String -- likewise
 , size:: BufferSize 
 , pos:: Position
 , ora_type:: BufferType }

-- understandably, below is just a stub ...
alloc_buffer (siz, typ) ps = 
  return $ Buffer { bufptr = show ps, pos = ps,  size = siz, ora_type = typ}
  
-- In this stub, don't do anything
free ptr = return ()


-- DB Column types

class DBType a where
  alloc_buffer_hints:: a - (BufferSize, BufferType)
  col_fetch:: Buffer - IO a

instance DBType String where
  alloc_buffer_hints _ = (2000, ORA_char)
  col_fetch buffer = return (bufptr buffer)

instance DBType Int where
  alloc_buffer_hints _ = (4, ORA_int)
  col_fetch buffer = return (read $ bufptr buffer)
  
-- need to add more ...

-- Row iteratees. Note, the folowing two instances cover ALL possible
-- iteratees. No other instances are needed

class SQLIteratee iter seed where
iter_apply:: [Buffer] - seed - iter - IO (Either seed seed)
alloc_buffers:: Position - iter - seed - IO [Buffer]

instance (DBType a) = SQLIteratee (a-seed-Either seed seed) seed where
iter_apply [buf] seed fn = col_fetch buf = (\v - return$ fn v seed)
alloc_buffers n _ _ = 
   sequence [alloc_buffer (alloc_buffer_hints (undefined::a)) n]

instance (SQLIteratee iter' seed, DBType a) = SQLIteratee (a-iter') seed 
 where
iter_apply (buf:others) seed fn = 
  col_fetch buf = (\v - iter_apply others seed (fn v))
alloc_buffers n fn seed = do
  this_buffer - alloc_buffer (alloc_buffer_hints (undefined::a)) n
  other_buffers - alloc_buffers (n+1) (fn (undefined::a)) seed
  return (this_buffer:other_buffers)

free_buffers = mapM_ free

-- The left fold iterator -- the query executor

data Session   -- not relevant for this example
data SQLStmt

db_execute session query = return ()

db_fetch_row buffers = return ()  -- use static data

doQuery:: (SQLIteratee iter seed) = Session - SQLStmt - iter - seed - IO seed

-- In this example, we just allocate buffers, fetch two rows and terminate
-- with a clean-up

doQuery session query iteratee seed = do
  buffers - alloc_buffers 0 iteratee seed
  db_execute session query
  db_fetch_row buffers
  (Right seed) - iter_apply buffers seed iteratee
  db_fetch_row buffers
  (Right seed) - iter_apply buffers seed iteratee
  free_buffers buffers
  return seed
  

-- Tests

-- Query returns one column of type String
test1 = doQuery undefined undefined iter1 ([]::[String])
  where
 iter1:: String - [String] - Either [String] [String]
 iter1 s acc = Right $ s:acc 


-- Query returns two columns of types String and Int
test2 = doQuery undefined undefined iter2 ([]::[(String,Int)])
  where
 iter2:: String - Int - [(String,Int)] - 
 Either [(String,Int)] [(String,Int)]
 iter2 s i acc = Right $ (s,i):acc 


-- Query returns three columns of types Int, String and Int
test3 = doQuery 

Database interface - would like advice on oracle library binding

2003-09-23 Thread Bayley, Alistair
(2nd attempt; mailman thinks I'm not a list member, but it still keeps
sending me mail.)

Still making slow progress on an Oracle database binding... now I'm trying
to fit the API I have into some sort of abstract interface (like the one(s)
discussed previously:
 http://haskell.org/pipermail/haskell-cafe/2003-August/004957.html ).


1. Is the left-fold the best/only interface to expose? I think yes, but that
doesn't allow fine-grained control over cursors i.e. being able to open many
cursors at once and interleave fetches from them. Or does it?


2. I'm finding it hard to write a doQuery function that takes an extraction
function that isn't a pig to write. Some advice would be useful here... (and
a long-ish explanation follows):

The Oracle Call Interface (OCI) requires that I allocate buffers for the
result of a single row fetch, before the first row is fetched. So a query
involves:
 - prepare statement etc
 - allocate buffers (one for each column - call OCI C function
DefineByPos)
 - fetch row
 - extract/marshal data from buffer into Haskell types (which are then
processed by fold function)
 - fetch, marshal (repeat until no more rows)
 - free buffers

i.e. the same buffers are re-used for each row.

The problem for me is how to specify the left-fold function in terms of the
low-level API. If I want to supply extraction functions (to extract Haskell
values from result buffer), how do I manage the buffer allocation in the
doQuery function? The buffer allocate/free code also needs to know the
column positions and types, in the same manner as the extract functions.

I want to be able to write code like this:

results - doQuery dbconn sqltext [] \row results - do
name- stringv row 1
address - stringv row 2
return (name,address):results

.. where the stringv function extracts/marshals a Haskell String from the
result buffer.

The intermediate approach I currently have means I have to pass an IO action
into the doQuery function that, when evaluated, allocates the buffer and
returns two more actions:
 - an action that extracts the row as a tuple
 - another action that frees the buffer

The doQuery function evaluates the initial action (to allocate the buffer),
uses the extract action to build the result (at present a list), and when
there are no more rows, uses the free action to free the buffer.

This approach is quite awkward (especially w.r.t. writing extract
functions), and it's hard for me to see how to build a better interface.
Hard, because of the memory management requirements.



Here's a chunk of the code. A lot of it is OCI plumbing, but I hope you can
see how awkward it is to create an extract function (see ex3 at the bottom).

Given pointers to the buffer, extract a string of variable length (you have
to terminate it yourself).

 fetchStringVal :: OCIColInfo - IO String
 fetchStringVal (_, bufptr, nullindptr, retsizeptr) = do
   retsize - liftM cShort2Int (peek retsizeptr)
   nullind - liftM cShort2Int (peek nullindptr) -- unused
   pokeByteOff (castPtr bufptr) retsize nullByte
   val - peekCString (castPtr bufptr)
   return val

Free a single column's buffer.

 freeColBuffer :: OCIColInfo - IO ()
 freeColBuffer (_, bufptr, nullindptr, retsizeptr) = do
   free bufptr
   free retsizeptr
   free nullindptr

Create a buffer for a string column, and return the extract and free IO
actions.

 getExtractFnString :: Int - ErrorHandle - StmtHandle - IO (IO String,
IO ())
 getExtractFnString posn err stmt = do
   c - defineCol err stmt posn 2000 oci_SQLT_CHR
   return ((fetchStringVal c), (freeColBuffer c))


doQuery uses the extractFns action to create the result buffer,
and the two actions (extract and free) which are passed to doQuery2.

 doQuery2 :: ErrorHandle - StmtHandle - IO a - IO () - [a] - IO [a]
 doQuery2 err stmt extractData freeMem results = do
   rc - fetch err stmt
   if rc == oci_NO_DATA
 then do
   freeMem
   return results
 else do
   v - extractData
   doQuery2 err stmt extractData freeMem (v:results)

 doQuery :: Session - String - (ErrorHandle - StmtHandle - IO (IO a, IO
())) - IO [a]
 doQuery (Sess env err con) qry extractFns = do
   stmt - getStmt env err
   prepare err stmt qry
   execute err con stmt
   (extractData, freeMem) - extractFns err stmt
   doQuery2 err stmt extractData freeMem []


The interface provided by doQuery means I have to write extract functions
like this.
Here's one for a select that returns three String columns.
It's quite awkward...

 ex3 :: ErrorHandle - StmtHandle - IO (IO (String, String, String), IO
())
 ex3 err stmt = do
   (fetchcol1, freecol1) - getExtractFnString 1 err stmt
   (fetchcol2, freecol2) - getExtractFnString 2 err stmt
   (fetchcol3, freecol3) - getExtractFnString 3 err stmt
   return
 ( do { s1 - fetchcol1; s2 - fetchcol2; s3 - fetchcol3; return (s1,
s2, s3) }
 , do { freecol1; freecol2; freecol3 }
 )



Database interface - would like advice on oracle library binding

2003-09-23 Thread Tom Pledger
Bayley, Alistair writes:
 :
 | Still making slow progress on an Oracle database binding... now I'm trying
 | to fit the API I have into some sort of abstract interface (like the one(s)
 | discussed previously:
 |  http://haskell.org/pipermail/haskell-cafe/2003-August/004957.html ).
 | 
 | 
 | 1. Is the left-fold the best/only interface to expose? I think yes,
 | but that doesn't allow fine-grained control over cursors i.e. being
 | able to open many cursors at once and interleave fetches from
 | them. Or does it?

It looks like the interleaving would be limited to a nested loop
structure: a cursor could be processed in full during one extraction
for another cursor.

Application-side nested loop structures are often a poor substitute
for server-side joins.

 | 2. I'm finding it hard to write a doQuery function that takes an
 | extraction function that isn't a pig to write. Some advice would be
 | useful here... (and a long-ish explanation follows):
 :

Here's my attempt to summarise the piggishness you describe:

The interface to Oracle requires that you initialise a cursor by
allocating a suitably typed buffer for each column prior to
fetching the first row, and finalise a cursor by freeing those
buffers after fetching the last row.

This means that we must iterate over the columns 3 times.  We
would prefer to express this iteration only once, and have the
other 2 happen automatically within the library.  (As opposed to
what ex3 does, which is to iterate for getExtractFnString, iterate
for fetchcolN, and iterate for freecolN.)

Here's one approach: find the OCI equivalent of JDBC's
ResultSetMetaData, and use it to drive the allocation and freeing of
buffers.

Here's another:

Add a mode attribute to the abstract data type which encompasses
ErrorHandle and StmtHandle.  (I'll persist in referring to that
ADT as Cursor.)

Expect extraction functions to be written along these lines:

\cursor result
   - do field1 - getIntcursor
 field2 - getString cursor
 field3 - getString cursor
 return ((field1, field2, field3):result, True)

Make getInt (and friends) behave differently depending on the mode
of the cursor they're passed: either allocate a buffer and return
_|_, decode and return the current column of the current row, or
free a buffer and return _|_.

doQuery could then apply the extraction function once in Allocate
mode after opening the cursor, once per fetched row in Decode
mode, and once in Free mode at the end.

There's nothing to stop an extraction function from varying the number
of get___ functions it applies, or trying to match their results when
not in Decode mode.  These weakness could be mitigated by:

Pointing out that some database connection standards (JDBC, and
for all I know also ODBC) don't guarantee that you can still get
at a row's 1st column after you've looked at its 2nd column,
i.e. there's a precedent for such fragility.

Complicating the extraction functions by giving them the type

(Cursor - b - IO (IO (b, Bool)))

, expecting that all the get___ functions are applied in the outer
IO layer, and undertaking that the inner IO layer will only be
used in Decode mode.

Regards,
Tom

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe