Re: [Drizzle-discuss] file system storage engine

2010-06-24 Thread Stewart Smith
On Fri, 18 Jun 2010 13:45:54 +0800, ZQ zim...@gmail.com wrote:
 could you please do some review over my branch? It would be helpful.

Sorry this has taken a little while to get back to you, I've been in
training all week. Hope you were still able to get some things done.

Cool! I've spotted a few things for you to take a look at:

- FilesystemCursor::close should check the return value of close(),
  especially for EINTR.
  - on non-EINTR errors, you can probably just return them and kind of
  hope that the upper layer reports these correctly. Since with buffered
  IO close(2) can also return all the errors that write() can, things
  like EIO may occur here.

- FilesystemCursor::position()
  - avoid using things in the drizzled::internal namespace. Instead you
can just cast ref to an off_t and store it in there.

- FilesystemCursor::openUpdateFile()
  - on startup you should remove the .UPDATE file if it exists

- getAllFields() is a bit misleadingly named. Perhaps recordToString() ?
  - is storing '0' for NULL a good idea? Perhaps another engine
option. defaulting to 'NULL' is probably better too.

- doInsertRecord()
  - you should check the return code of write(2) lots of interesting
things can go wrong.

- ::validateCreateTableOption()
  - instead of validating access to the file in
validateCreateTableOption (which can only return 'success' or
'failure'), check this in ::doCreateTable which can return a real
error message (e.g. ENOENT, EPERM, EIO etc)

- FilesystemEngine::FilesystemEngine()
  - the mutex is destroyed in the destructor, it should be created in
the constructor.

- ::get_share()
  - use an ENUM for share-separator mode instead of number.

I hope to have a closer look at the update/deletion code shortly.

hopefully this helps. things are looking pretty good. One idea may be to
go through the 'load data infile' tests, copy them and rewrite parts to
use the filesystem engine instead.

cheers,
stewart

-- 
Stewart Smith

___
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : drizzle-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp


Re: [Drizzle-discuss] file system storage engine

2010-06-22 Thread Stewart Smith
On Fri, 18 Jun 2010 13:45:54 +0800, ZQ zim...@gmail.com wrote:
 I've updated my branch to fix the problems.
 could you please do some review over my branch? It would be helpful.

Excellent.

I'll look at it today.

 the locking part still has some problems, when two sessions try to update
 different rows,
 shouldn't this be prevented by table lock?

yes.

 the next step would be:
 1. investigate how other storage engine, say blitzdb  handles the locking,
 and learn something from it.
 2. add support for tag files so that this engine handle read files like
 /proc/meminfo directly.

quite likely. let me have a look at your code too.

-- 
Stewart Smith

___
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : drizzle-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp


Re: [Drizzle-discuss] file system storage engine

2010-06-17 Thread ZQ
Hi Stewart,

I've updated my branch to fix the problems.
could you please do some review over my branch? It would be helpful.
the locking part still has some problems, when two sessions try to update
different rows,
shouldn't this be prevented by table lock?

the next step would be:
1. investigate how other storage engine, say blitzdb  handles the locking,
and learn something from it.
2. add support for tag files so that this engine handle read files like
/proc/meminfo directly.


On Tue, Jun 8, 2010 at 9:02 AM, Stewart Smith stew...@flamingspork.comwrote:

 (CC'd drizzle-discuss in case anybody else has some thoughts too)

 On Mon, 07 Jun 2010 18:58:56 +0800, Zimin zim...@gmail.com wrote:
lp:~ziminq/drizzle/filesystem-storage-engine

 A few things:
 - it looks like you haven't merged from trunk in a while, you should try
 to keep rather up to date with trunk.
 - in ha_filesystem::open
  - the table message is already loaded by the upper layer, so you can
access it through table-share (see table_share.h)
  - It looks as though you're not trying to open the actual file there?
  - you should probably try to open the file there and report back
  errors
 - in doStartTableScan
  - instead of using cerr for errors, use something like
  push_warning_printf:

  (for example, from embedded_innodb):

  if (innodb_err != DB_SUCCESS)
  {
push_warning_printf(session, DRIZZLE_ERROR::WARN_LEVEL_ERROR,
ER_CANT_CREATE_TABLE,
_(Cannot create table %s. InnoDB Error %d (%s)\n),
innodb_table_name.c_str(), innodb_err,
 ib_strerror(innodb_err));
return HA_ERR_GENERIC;
  }

 - in rnd_next()
  - Don't use drizzled::String, instead use std::string.
  - have so you have a row separator as well as a column separator
e.g. a CSV file has ROW_SEPARATOR='\n', COLUMN_SEPARATOR=','
  - you should set each field not null using Field instead of using
memset.

 - position() and rnd_pos()
  - you will need to implement these (check out my rnd_pos test for
embedded_innodb).
  - they should be pretty easy. store the position in the file, seek to
it.
  - you will need to set ref_length in ::open (see embedded_innodb for
 example)

 - info()
  - you'll need some things here, but shouldn't be hard.

 - locking
  - you probably want to write out the locking strategy.
  - possible have options? fcntl locks?
  - you have a table share (that isn't used), so the locking doesn't
work :)
  - keep in mind that Drizzle is very much about removing horrible table
level locks, so this is the responsibility of the engine.

 - insert/update row
  - you need to operate on the buffer you're given. this means using
move_field_offset and that awful API (i've written a blog post on
this recently)

  I would like to know whether such approach is OK or not.
  and I use iostream file operation, is it ok? I know that getline() is
  slow on big files.

 In theory it should be okay - but what about the case where rows are not
 delimited by newlines?

  to delete and update a line, I used a temp file and rename the temp file
  to the original one, it's kind of ugly.

 I think it'd be great if you also:
 - added more blueprints for each feature
 - targeted these blueprints for a series (Dexter)
 - targeted these blueprints for milestones (i.e. dates where you aim to
 have code with that functionality merged into the main tree)

 I do really like that there are tests there too!

 great work so far, hopefully the above helps!
 --
 Stewart Smith

___
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : drizzle-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp


Re: [Drizzle-discuss] file system storage engine

2010-06-08 Thread Zimin

Hi Stewart,

Thanks for the great info.
I've added some blueprints to track my progress.
For some of the questions, e.g. locking strategy, I've not come up with 
an elegant solution,

I'll think a little more and discuss with you later.

--zimin

On 08/06/2010 9:02 AM, Stewart Smith wrote:

(CC'd drizzle-discuss in case anybody else has some thoughts too)

On Mon, 07 Jun 2010 18:58:56 +0800, Ziminzim...@gmail.com  wrote:
   

   lp:~ziminq/drizzle/filesystem-storage-engine
 

A few things:
- it looks like you haven't merged from trunk in a while, you should try
to keep rather up to date with trunk.
- in ha_filesystem::open
   - the table message is already loaded by the upper layer, so you can
 access it through table-share (see table_share.h)
   - It looks as though you're not trying to open the actual file there?
   - you should probably try to open the file there and report back
   errors
- in doStartTableScan
   - instead of using cerr for errors, use something like
   push_warning_printf:

   (for example, from embedded_innodb):

   if (innodb_err != DB_SUCCESS)
   {
 push_warning_printf(session, DRIZZLE_ERROR::WARN_LEVEL_ERROR,
 ER_CANT_CREATE_TABLE,
 _(Cannot create table %s. InnoDB Error %d (%s)\n),
 innodb_table_name.c_str(), innodb_err, 
ib_strerror(innodb_err));
 return HA_ERR_GENERIC;
   }

- in rnd_next()
   - Don't use drizzled::String, instead use std::string.
   - have so you have a row separator as well as a column separator
 e.g. a CSV file has ROW_SEPARATOR='\n', COLUMN_SEPARATOR=','
   - you should set each field not null using Field instead of using
 memset.

- position() and rnd_pos()
   - you will need to implement these (check out my rnd_pos test for
 embedded_innodb).
   - they should be pretty easy. store the position in the file, seek to
 it.
   - you will need to set ref_length in ::open (see embedded_innodb for example)

- info()
   - you'll need some things here, but shouldn't be hard.

- locking
   - you probably want to write out the locking strategy.
   - possible have options? fcntl locks?
   - you have a table share (that isn't used), so the locking doesn't
 work :)
   - keep in mind that Drizzle is very much about removing horrible table
 level locks, so this is the responsibility of the engine.

- insert/update row
   - you need to operate on the buffer you're given. this means using
 move_field_offset and that awful API (i've written a blog post on
 this recently)

   

I would like to know whether such approach is OK or not.
and I use iostream file operation, is it ok? I know that getline() is
slow on big files.
 

In theory it should be okay - but what about the case where rows are not
delimited by newlines?

   

to delete and update a line, I used a temp file and rename the temp file
to the original one, it's kind of ugly.
 

I think it'd be great if you also:
- added more blueprints for each feature
- targeted these blueprints for a series (Dexter)
- targeted these blueprints for milestones (i.e. dates where you aim to
have code with that functionality merged into the main tree)

I do really like that there are tests there too!

great work so far, hopefully the above helps!
   



___
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : drizzle-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp


Re: [Drizzle-discuss] file system storage engine

2010-06-07 Thread Stewart Smith
(CC'd drizzle-discuss in case anybody else has some thoughts too)

On Mon, 07 Jun 2010 18:58:56 +0800, Zimin zim...@gmail.com wrote:
   lp:~ziminq/drizzle/filesystem-storage-engine

A few things:
- it looks like you haven't merged from trunk in a while, you should try
to keep rather up to date with trunk.
- in ha_filesystem::open
  - the table message is already loaded by the upper layer, so you can
access it through table-share (see table_share.h)
  - It looks as though you're not trying to open the actual file there?
  - you should probably try to open the file there and report back
  errors
- in doStartTableScan
  - instead of using cerr for errors, use something like
  push_warning_printf:

  (for example, from embedded_innodb):

  if (innodb_err != DB_SUCCESS)
  {
push_warning_printf(session, DRIZZLE_ERROR::WARN_LEVEL_ERROR,
ER_CANT_CREATE_TABLE,
_(Cannot create table %s. InnoDB Error %d (%s)\n),
innodb_table_name.c_str(), innodb_err, 
ib_strerror(innodb_err));
return HA_ERR_GENERIC;
  }

- in rnd_next()
  - Don't use drizzled::String, instead use std::string.
  - have so you have a row separator as well as a column separator
e.g. a CSV file has ROW_SEPARATOR='\n', COLUMN_SEPARATOR=','
  - you should set each field not null using Field instead of using
memset.

- position() and rnd_pos()
  - you will need to implement these (check out my rnd_pos test for
embedded_innodb).
  - they should be pretty easy. store the position in the file, seek to
it.
  - you will need to set ref_length in ::open (see embedded_innodb for example)

- info()
  - you'll need some things here, but shouldn't be hard.

- locking
  - you probably want to write out the locking strategy.
  - possible have options? fcntl locks?
  - you have a table share (that isn't used), so the locking doesn't
work :)
  - keep in mind that Drizzle is very much about removing horrible table
level locks, so this is the responsibility of the engine.

- insert/update row
  - you need to operate on the buffer you're given. this means using
move_field_offset and that awful API (i've written a blog post on
this recently)

 I would like to know whether such approach is OK or not.
 and I use iostream file operation, is it ok? I know that getline() is 
 slow on big files.

In theory it should be okay - but what about the case where rows are not
delimited by newlines?

 to delete and update a line, I used a temp file and rename the temp file 
 to the original one, it's kind of ugly.

I think it'd be great if you also:
- added more blueprints for each feature
- targeted these blueprints for a series (Dexter)
- targeted these blueprints for milestones (i.e. dates where you aim to
have code with that functionality merged into the main tree)

I do really like that there are tests there too!

great work so far, hopefully the above helps!
-- 
Stewart Smith

___
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : drizzle-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp