Re: [Drizzle-discuss] file system storage engine
On Fri, 18 Jun 2010 13:45:54 +0800, ZQ zim...@gmail.com wrote: could you please do some review over my branch? It would be helpful. Sorry this has taken a little while to get back to you, I've been in training all week. Hope you were still able to get some things done. Cool! I've spotted a few things for you to take a look at: - FilesystemCursor::close should check the return value of close(), especially for EINTR. - on non-EINTR errors, you can probably just return them and kind of hope that the upper layer reports these correctly. Since with buffered IO close(2) can also return all the errors that write() can, things like EIO may occur here. - FilesystemCursor::position() - avoid using things in the drizzled::internal namespace. Instead you can just cast ref to an off_t and store it in there. - FilesystemCursor::openUpdateFile() - on startup you should remove the .UPDATE file if it exists - getAllFields() is a bit misleadingly named. Perhaps recordToString() ? - is storing '0' for NULL a good idea? Perhaps another engine option. defaulting to 'NULL' is probably better too. - doInsertRecord() - you should check the return code of write(2) lots of interesting things can go wrong. - ::validateCreateTableOption() - instead of validating access to the file in validateCreateTableOption (which can only return 'success' or 'failure'), check this in ::doCreateTable which can return a real error message (e.g. ENOENT, EPERM, EIO etc) - FilesystemEngine::FilesystemEngine() - the mutex is destroyed in the destructor, it should be created in the constructor. - ::get_share() - use an ENUM for share-separator mode instead of number. I hope to have a closer look at the update/deletion code shortly. hopefully this helps. things are looking pretty good. One idea may be to go through the 'load data infile' tests, copy them and rewrite parts to use the filesystem engine instead. cheers, stewart -- Stewart Smith ___ Mailing list: https://launchpad.net/~drizzle-discuss Post to : drizzle-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp
Re: [Drizzle-discuss] file system storage engine
On Fri, 18 Jun 2010 13:45:54 +0800, ZQ zim...@gmail.com wrote: I've updated my branch to fix the problems. could you please do some review over my branch? It would be helpful. Excellent. I'll look at it today. the locking part still has some problems, when two sessions try to update different rows, shouldn't this be prevented by table lock? yes. the next step would be: 1. investigate how other storage engine, say blitzdb handles the locking, and learn something from it. 2. add support for tag files so that this engine handle read files like /proc/meminfo directly. quite likely. let me have a look at your code too. -- Stewart Smith ___ Mailing list: https://launchpad.net/~drizzle-discuss Post to : drizzle-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp
Re: [Drizzle-discuss] file system storage engine
Hi Stewart, I've updated my branch to fix the problems. could you please do some review over my branch? It would be helpful. the locking part still has some problems, when two sessions try to update different rows, shouldn't this be prevented by table lock? the next step would be: 1. investigate how other storage engine, say blitzdb handles the locking, and learn something from it. 2. add support for tag files so that this engine handle read files like /proc/meminfo directly. On Tue, Jun 8, 2010 at 9:02 AM, Stewart Smith stew...@flamingspork.comwrote: (CC'd drizzle-discuss in case anybody else has some thoughts too) On Mon, 07 Jun 2010 18:58:56 +0800, Zimin zim...@gmail.com wrote: lp:~ziminq/drizzle/filesystem-storage-engine A few things: - it looks like you haven't merged from trunk in a while, you should try to keep rather up to date with trunk. - in ha_filesystem::open - the table message is already loaded by the upper layer, so you can access it through table-share (see table_share.h) - It looks as though you're not trying to open the actual file there? - you should probably try to open the file there and report back errors - in doStartTableScan - instead of using cerr for errors, use something like push_warning_printf: (for example, from embedded_innodb): if (innodb_err != DB_SUCCESS) { push_warning_printf(session, DRIZZLE_ERROR::WARN_LEVEL_ERROR, ER_CANT_CREATE_TABLE, _(Cannot create table %s. InnoDB Error %d (%s)\n), innodb_table_name.c_str(), innodb_err, ib_strerror(innodb_err)); return HA_ERR_GENERIC; } - in rnd_next() - Don't use drizzled::String, instead use std::string. - have so you have a row separator as well as a column separator e.g. a CSV file has ROW_SEPARATOR='\n', COLUMN_SEPARATOR=',' - you should set each field not null using Field instead of using memset. - position() and rnd_pos() - you will need to implement these (check out my rnd_pos test for embedded_innodb). - they should be pretty easy. store the position in the file, seek to it. - you will need to set ref_length in ::open (see embedded_innodb for example) - info() - you'll need some things here, but shouldn't be hard. - locking - you probably want to write out the locking strategy. - possible have options? fcntl locks? - you have a table share (that isn't used), so the locking doesn't work :) - keep in mind that Drizzle is very much about removing horrible table level locks, so this is the responsibility of the engine. - insert/update row - you need to operate on the buffer you're given. this means using move_field_offset and that awful API (i've written a blog post on this recently) I would like to know whether such approach is OK or not. and I use iostream file operation, is it ok? I know that getline() is slow on big files. In theory it should be okay - but what about the case where rows are not delimited by newlines? to delete and update a line, I used a temp file and rename the temp file to the original one, it's kind of ugly. I think it'd be great if you also: - added more blueprints for each feature - targeted these blueprints for a series (Dexter) - targeted these blueprints for milestones (i.e. dates where you aim to have code with that functionality merged into the main tree) I do really like that there are tests there too! great work so far, hopefully the above helps! -- Stewart Smith ___ Mailing list: https://launchpad.net/~drizzle-discuss Post to : drizzle-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp
Re: [Drizzle-discuss] file system storage engine
Hi Stewart, Thanks for the great info. I've added some blueprints to track my progress. For some of the questions, e.g. locking strategy, I've not come up with an elegant solution, I'll think a little more and discuss with you later. --zimin On 08/06/2010 9:02 AM, Stewart Smith wrote: (CC'd drizzle-discuss in case anybody else has some thoughts too) On Mon, 07 Jun 2010 18:58:56 +0800, Ziminzim...@gmail.com wrote: lp:~ziminq/drizzle/filesystem-storage-engine A few things: - it looks like you haven't merged from trunk in a while, you should try to keep rather up to date with trunk. - in ha_filesystem::open - the table message is already loaded by the upper layer, so you can access it through table-share (see table_share.h) - It looks as though you're not trying to open the actual file there? - you should probably try to open the file there and report back errors - in doStartTableScan - instead of using cerr for errors, use something like push_warning_printf: (for example, from embedded_innodb): if (innodb_err != DB_SUCCESS) { push_warning_printf(session, DRIZZLE_ERROR::WARN_LEVEL_ERROR, ER_CANT_CREATE_TABLE, _(Cannot create table %s. InnoDB Error %d (%s)\n), innodb_table_name.c_str(), innodb_err, ib_strerror(innodb_err)); return HA_ERR_GENERIC; } - in rnd_next() - Don't use drizzled::String, instead use std::string. - have so you have a row separator as well as a column separator e.g. a CSV file has ROW_SEPARATOR='\n', COLUMN_SEPARATOR=',' - you should set each field not null using Field instead of using memset. - position() and rnd_pos() - you will need to implement these (check out my rnd_pos test for embedded_innodb). - they should be pretty easy. store the position in the file, seek to it. - you will need to set ref_length in ::open (see embedded_innodb for example) - info() - you'll need some things here, but shouldn't be hard. - locking - you probably want to write out the locking strategy. - possible have options? fcntl locks? - you have a table share (that isn't used), so the locking doesn't work :) - keep in mind that Drizzle is very much about removing horrible table level locks, so this is the responsibility of the engine. - insert/update row - you need to operate on the buffer you're given. this means using move_field_offset and that awful API (i've written a blog post on this recently) I would like to know whether such approach is OK or not. and I use iostream file operation, is it ok? I know that getline() is slow on big files. In theory it should be okay - but what about the case where rows are not delimited by newlines? to delete and update a line, I used a temp file and rename the temp file to the original one, it's kind of ugly. I think it'd be great if you also: - added more blueprints for each feature - targeted these blueprints for a series (Dexter) - targeted these blueprints for milestones (i.e. dates where you aim to have code with that functionality merged into the main tree) I do really like that there are tests there too! great work so far, hopefully the above helps! ___ Mailing list: https://launchpad.net/~drizzle-discuss Post to : drizzle-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp
Re: [Drizzle-discuss] file system storage engine
(CC'd drizzle-discuss in case anybody else has some thoughts too) On Mon, 07 Jun 2010 18:58:56 +0800, Zimin zim...@gmail.com wrote: lp:~ziminq/drizzle/filesystem-storage-engine A few things: - it looks like you haven't merged from trunk in a while, you should try to keep rather up to date with trunk. - in ha_filesystem::open - the table message is already loaded by the upper layer, so you can access it through table-share (see table_share.h) - It looks as though you're not trying to open the actual file there? - you should probably try to open the file there and report back errors - in doStartTableScan - instead of using cerr for errors, use something like push_warning_printf: (for example, from embedded_innodb): if (innodb_err != DB_SUCCESS) { push_warning_printf(session, DRIZZLE_ERROR::WARN_LEVEL_ERROR, ER_CANT_CREATE_TABLE, _(Cannot create table %s. InnoDB Error %d (%s)\n), innodb_table_name.c_str(), innodb_err, ib_strerror(innodb_err)); return HA_ERR_GENERIC; } - in rnd_next() - Don't use drizzled::String, instead use std::string. - have so you have a row separator as well as a column separator e.g. a CSV file has ROW_SEPARATOR='\n', COLUMN_SEPARATOR=',' - you should set each field not null using Field instead of using memset. - position() and rnd_pos() - you will need to implement these (check out my rnd_pos test for embedded_innodb). - they should be pretty easy. store the position in the file, seek to it. - you will need to set ref_length in ::open (see embedded_innodb for example) - info() - you'll need some things here, but shouldn't be hard. - locking - you probably want to write out the locking strategy. - possible have options? fcntl locks? - you have a table share (that isn't used), so the locking doesn't work :) - keep in mind that Drizzle is very much about removing horrible table level locks, so this is the responsibility of the engine. - insert/update row - you need to operate on the buffer you're given. this means using move_field_offset and that awful API (i've written a blog post on this recently) I would like to know whether such approach is OK or not. and I use iostream file operation, is it ok? I know that getline() is slow on big files. In theory it should be okay - but what about the case where rows are not delimited by newlines? to delete and update a line, I used a temp file and rename the temp file to the original one, it's kind of ugly. I think it'd be great if you also: - added more blueprints for each feature - targeted these blueprints for a series (Dexter) - targeted these blueprints for milestones (i.e. dates where you aim to have code with that functionality merged into the main tree) I do really like that there are tests there too! great work so far, hopefully the above helps! -- Stewart Smith ___ Mailing list: https://launchpad.net/~drizzle-discuss Post to : drizzle-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp