Re: DbDataStore implementation

2007-12-13 Thread Thomas Mueller
Hi,

 I have a few other suggestions for code changes:
 * add a getter for the identifier in BLOBInDataStore (to be used by the GC)

Why do you need this?

 At the very least, I would put all the statements in the properties file, 
 because there's no
 documentation about them.

Repeating the statements is not a good idea (they could get out of
sync). I will add Javadoc documentation.

 I meant that the table name shoud be configurable. I used the 
 ${schemaObjectPrefix}

Done. The prefix is now configurable. I'm don't know why it is needed,
could you explain?

Regards,
Thomas


Re: DbDataStore implementation

2007-12-03 Thread Thomas Mueller
Hi,

 multiple connections
 from a connection pool.

I understand. It's probably not a good idea to open a new connection
for each operation, because for some databases opening a connection
takes very long. I have implemented a simple pool (with maxSize) now,
I like to test it a little before committing it.

It probably makes sense to make the class extensible, but I don't know
yet where exactly this should be. I will make all methods and
variables protected.
If possible, I like to avoid creating a new class for each database
type, but if this is required only few methods should be implemented
there.

 I tested with SQL Server 2005 with the default statements, and they don't 
 work because the BLOB data
 type is called IMAGE there. I changed it and everything else works fine.

OK. So the bugfix is to replace sqlserver.properties with:
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
createTable=CREATE TABLE DATASTORE(ID VARCHAR(255) PRIMARY KEY, LENGTH
BIGINT, LAST_MODIFIED BIGINT, DATA IMAGE)

Is this OK?

 No, it's not required, but if all you want to do is change the table name, I 
 think it's too much having to re-write all the statements.

I don't understand, why do you want to change the table name specially
for one database type? Or do you mean make the table name
configurable? That does make sense (if you want to store it in another
schema for example). Should I implement the ${schemaObjectPrefix} as
you have done?

 Same goes for something as simple as changing BLOB to IMAGE
 in only one statement.

I think it's OK add the whole create table statement to the configuration file.

 Session.save() - various calls to DbDataStore.getRecord() and
  DbDataRecord.getStream()

 Just doing the following in a class that extends AbstractJCRTest is enough

 Session session = helper.getSuperuserSession();
 Node root = session.getRootNode();
 root.setProperty(notice, new FileInputStream(NOTICE.txt));
 session.save();

OK thank for the test case! I didn't have time yet to find out how to
solve this performance problem.

 calls usesIdentifier() at the end of addRecord()
 better it as soon as the definitive identifier is available?

You are right! I will fix this.

Regards,
Thomas


Re: DbDataStore implementation

2007-12-03 Thread Esteban Franqueiro
 multiple connections
 from a connection pool.

 I understand. It's probably not a good idea to open a new connection
 for each operation, because for some databases opening a connection
 takes very long. I have implemented a simple pool (with maxSize) now,
 I like to test it a little before committing it.

Indeed, the code I posted did exactly that, but the idea was that it was very 
easy to subclass it 
and change the way a connection is obtained.

 It probably makes sense to make the class extensible, but I don't know
 yet where exactly this should be. I will make all methods and
 variables protected.

That's a start. Ideally, I think we should do something like the template 
method pattern.
I have a few other suggestions for code changes:
* add a getter for the identifier in BLOBInDataStore (to be used by the GC)
* make RepositoryImpl.getWorkspaceNames() public

 I tested with SQL Server 2005 with the default statements, and they don't 
 work because the BLOB 
 data
 type is called IMAGE there. I changed it and everything else works fine.

 OK. So the bugfix is to replace sqlserver.properties with:
 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 createTable=CREATE TABLE DATASTORE(ID VARCHAR(255) PRIMARY KEY, LENGTH
 BIGINT, LAST_MODIFIED BIGINT, DATA IMAGE)

 Is this OK?

At the very least, I would put all the statements in the properties file, 
because there's no 
documentation about them.

 No, it's not required, but if all you want to do is change the table name, I 
 think it's too much 
 having to re-write all the statements.

 I don't understand, why do you want to change the table name specially
 for one database type? Or do you mean make the table name
 configurable? That does make sense (if you want to store it in another
 schema for example). Should I implement the ${schemaObjectPrefix} as
 you have done?

Yes, I meant that the table name shoud be configurable. I used the 
${schemaObjectPrefix} idea 
because it's the way it's done in other parts of the core.

 Just doing the following in a class that extends AbstractJCRTest is enough

 Session session = helper.getSuperuserSession();
 Node root = session.getRootNode();
 root.setProperty(notice, new FileInputStream(NOTICE.txt));
 session.save();

 OK thank for the test case! I didn't have time yet to find out how to
 solve this performance problem.

On a related note, the fact that when you ask for a stream it is first written 
to disk, is by 
design?
Because if it's not, then I think another problem will arise later. Since the 
streams are written to 
disk, and from there sent to the clients, you can have multiple streams open at 
the same time 
(that's the idea). But if you fix this and start feeding the streams straight 
from the DB, then, 
since you only have one connection, you can have only one outstanding stream. 
Maybe I'm wrong here, 
but in my initial testing I couldn't feed two or more streams through the same 
connection.

Regards,

Esteban Franqueiro
[EMAIL PROTECTED] 


Notice:  This email message, together with any attachments, may contain 
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated 
entities,  that may be confidential,  proprietary,  copyrighted  and/or legally 
privileged, and is intended solely for the use of the individual or entity 
named in this message. If you are not the intended recipient, and have received 
this message in error, please immediately return this by email and then delete 
it.


Re: DbDataStore implementation

2007-11-29 Thread Esteban Franqueiro
Hi.

 * doesn't synchronizing the addRecord() method, and using only one 
 connection defeat one of the
 purposes of the data store of allowing maximum concurrency?

 Yes that's true. Using the data store itself improves concurrency as
 simple (non-blob) repository operations are not blocked by operations
 that involve blobs. Using multiple connections could improve
 concurrency and could even speed up the process (if the database
 writes to multiple hard drives). So far I have not thought about that.

While it's true that just having the data store offloads a lot from Jackrabbit, 
I think it's not 
enough.

 The question is: how important is this feature?

Well, for me it's crucial. As you can see, the code I posted uses multiple 
connections, possibly 
from a connection pool.
If this feature is not as important as I think it is, then I still think that 
we could, and should 
at least not prevent an easy extension that could add this feature. Currently, 
having all methods 
and fields private means that in order to extend the DB data store you have to 
reimplement every 
method.

 * making the SQL strings private and not initializing them in a method of 
 its own really 
 complicates
 extending the implementation

 Sorry I have committed the properties files to the wrong folder first!
 I have fixed it now. The SQL statements can be overloaded in the
 databaseType.properties file in
 src/main/resources/org/apache/jackrabbit/core/data/db. Currently they
 are not overloaded, but maybe they need to be. I have only tested
 derby and H2 so far. initDatabaseType() loads the properties file.

I tested with SQL Server 2005 with the default statements, and they don't work 
because the BLOB data 
type is called IMAGE there. I changed it and everything else works fine.

 (in any case, the SQL strings should be written as UPDATE  + tableSQL
 + DATASTORE SET DATA=? WHERE ID=?)

 Both the table name and the SQL strings can be overloaded (in the
 properties file), so building the SQL statements is not required in my
 view.

No, it's not required, but if all you want to do is change the table name, I 
think it's too much 
having to re-write all the statements. Same goes for something as simple as 
changing BLOB to IMAGE 
in only one statement.

 * during a Session.save() there are various calls to DbDataStore.getRecord() 
 and
 DbDataRecord.getStream(), for storing the blob int the blobStore. Why is 
 this necesary if the 
 binary
 content is already in the data store? It seems that this copy is overwritten 
 every time, but I 
 don't
 see the reason for all this calls to the DB, and file copies.

 That's not good. I like to solve this problem. Does this occur when
 simply storing a node with a large object? If not, do you have a
 simple test case?

Just doing the following in a class that extends AbstractJCRTest is enough

Session session = helper.getSuperuserSession();
Node root = session.getRootNode();
root.setProperty(notice, new FileInputStream(NOTICE.txt));
session.save();

The save() causes two getRecord() and one getStream() call. And the stream is 
copied to the 
filesystem in a temp file.

Another thing I noticed is that the code calls usesIdentifier() at the end of 
addRecord(). Shouldn't 
be better to call it as soon as the definitive identifier is available? In case 
the GC happens to 
run in between?
I'll keep playing with it and report if I find anything.
Regards,

Esteban Franqueiro
[EMAIL PROTECTED] 


Notice:  This email message, together with any attachments, may contain 
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated 
entities,  that may be confidential,  proprietary,  copyrighted  and/or legally 
privileged, and is intended solely for the use of the individual or entity 
named in this message. If you are not the intended recipient, and have received 
this message in error, please immediately return this by email and then delete 
it.


Re: DbDataStore implementation

2007-11-28 Thread Thomas Mueller
Hi,

 * doesn't synchronizing the addRecord() method, and using only one connection 
 defeat one of the
 purposes of the data store of allowing maximum concurrency?

Yes that's true. Using the data store itself improves concurrency as
simple (non-blob) repository operations are not blocked by operations
that involve blobs. Using multiple connections could improve
concurrency and could even speed up the process (if the database
writes to multiple hard drives). So far I have not thought about that.
The question is: how important is this feature?

 * making the SQL strings private and not initializing them in a method of its 
 own really complicates
 extending the implementation

Sorry I have committed the properties files to the wrong folder first!
I have fixed it now. The SQL statements can be overloaded in the
databaseType.properties file in
src/main/resources/org/apache/jackrabbit/core/data/db. Currently they
are not overloaded, but maybe they need to be. I have only tested
derby and H2 so far. initDatabaseType() loads the properties file.

 (in any case, the SQL strings should be written as UPDATE  + tableSQL
 + DATASTORE SET DATA=? WHERE ID=?)

Both the table name and the SQL strings can be overloaded (in the
properties file), so building the SQL statements is not required in my
view.

 * during a Session.save() there are various calls to DbDataStore.getRecord() 
 and
 DbDataRecord.getStream(), for storing the blob int the blobStore. Why is this 
 necesary if the binary
 content is already in the data store? It seems that this copy is overwritten 
 every time, but I don't
 see the reason for all this calls to the DB, and file copies.

That's not good. I like to solve this problem. Does this occur when
simply storing a node with a large object? If not, do you have a
simple test case?

Regards,
Thomas