Hi Ankit,

I think it is fair to assume that a CLOB will always be convertible to a
String - if memory on the client permits at least. Normally a java.sql.Clob
provides a getCharacterStream() method which returns a Reader. By setting
the system property 'metamodel.jdbc.convert.lobs' to "true" we offer the
service of eagerly reading that Reader to the end and returning a String.
The good thing about that is that the client avoids issues of the DataSet
is closed or even sometimes just moving the cursor to the next record may
invalidate the Reader. In other words: With the LOB conversion we offer a
way to make the MetaModel Row objects useable even though the connection to
the statement/resultset/connection is closed. It's of course optional.

The same idea is applied to BLOBs. We offer to automatically open the
java.sql.Blob InputStream and load it into a byte-array.

In my opinion the fact that something is a CLOB or a VARCHAR is a different
type of information than whether it's values are represented as
java.sql.Clob or String objects. The information on the column type is
valuable in it's own right and I would not want us to "make believe" that a
column which is *really *a CLOB is shown as a VARCHAR.

Now that we're on the topic, we might even also consider if using the JDBC
data type names is even appropriate as a default behaviour. I mean, we're
acting in the Java world and a type like VARCHAR or TINYINT etc. is only
used in SQL, not in native Java apps. Other types of datastores have
different data types and I guess many of the datastores only have simple
data types like String, Integer, Date etc. We could choose to go for a
non-enum solution for the ColumnType and thereby also provide more
flexibility. In the CSV module we could thus provide just a single
ColumnType, e.g.:

  public static final ColumnType CSV_COLUMN_TYPE = new
ColumnTypeImpl("String", String.class);

Read the example constructor arguments as this: Name would be "String" (or
maybe "Text" is more appropriate for "text files"?) and expected Java class
is String.class

In the JDBC module we could have other column types available, e.g.:

  public static final ColumnType VARCHAR = new ColumnTypeImpl("VARCHAR",
String.class);
  public static final ColumnType CLOB = new ColumnTypeImpl("CLOB",
java.sql.Clob.class);
  public static final ColumnType CLOB_CONVERTED = new
ColumnTypeImpl("CLOB", String.class);
  public static final ColumnType BLOB = new ColumnTypeImpl("BLOB",
java.sql.Blob.class);
  public static final ColumnType BLOB_CONVERTED = new
ColumnTypeImpl("BLOB", byte[].class);

(and there you then have my suggested solution for solving the Clob issue -
to use a String for the name and a Class for the expected Java type)

I can try to make a patch for this solution. That would sort of validate if
it will work or not. But first ... what do you guys think?

Only pitfall I see is that it breaks backwards compatibility. I think we
can retain most of the compile-time compatibility, but not runtime
compatibility (as in "drop in a new jar without recompile"). Also, old
objects serialized with the old enums would not be properly deserialized
into using these constants. Maybe that can be solved in
the LegacyDeserializationObjectInputStream [1] somehow.

Best regards,
Kasper


[1] Described here:
http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaModel


2014-04-01 10:28 GMT+02:00 Ankit Kumar <[email protected]>:

> Hi Kasper,
>
> Thanks for initiating this discussion as we are hit by this in MM right
> now.
>
> Thinking just a bit from a client of MM perspective, I guess the CLOB could
> contain a Object/File/Image/???/Large String text, so while getting CLOB's
> from the real database we still can check the content and if it smells more
> like a String then we return the VARCHAR otherwise an Object. May be this
> is not the neatest solution but I guess for a client this provides the
> flexibility needed.
>
> An additional point worth mentioning with respect to the above remark,
> playing with different databases we see some databases allow large text to
> be stored in VARCHAR kind of fields whereas some are still having limits of
> 4000 characters like Oracle. So when using MM as the abstraction layer
> above the databases this feature might be quite handy for a client.
>
> Excuse me if I sound a bit too biased here.
>
> Regards
> Ankit
>
>
> On Tue, Mar 25, 2014 at 3:54 PM, Kasper Sørensen <
> [email protected]> wrote:
>
> > Hi all,
> >
> > I just came across a potential issue in MetaModel's design and want to
> > share it and maybe start thinking of ways to fix or work around it...
> >
> > In our ColumnType enum we have the method getJavaEquivalentClass() which
> is
> > used to tell which java type to expect when querying a particular column.
> > For instance, of I query a VARCHAR column, I can expect a
> java.lang.String
> > value.
> >
> > Now the trouble is that in our JDBC module we have a system property
> which
> > allows for eager loading of BLOBs and CLOBs so that they are
> automatically
> > read into byte-arrays and Strings respectively. This is a great utility
> > because otherwise the user has to do a lot of tedious work with
> > inputstreams etc which in most cases isn't particularly useful - in most
> > cases you just want the byte[ ] or String.
> >
> > Now the trouble is that if you turn that system property on, you get
> > Strings or byte-arrays but the column type is still CLOB/BLOB and that
> > means that the "expected equivalent Java class" is still java.sql.Clob or
> > java.sql.Blob! If you build your code to expect that, then it will
> > eventually break because you get a String or a byte-array instead.
> >
> > How to make this consistent? I can think of a few ways, but none that I
> > really love:
> >
> > 1) Probably the easiest way to do it is to let the JDBC datacontext give
> > the columns other ColumnTypes. But that sorta doesn't feel good now that
> > the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.
> >
> > 2) We can remove the getEquivalentJavaClass() from ColumnType and instead
> > make it a direct member of the Column class. This will break backwards
> > compatibility of the API.
> >
> > 3) We can make ColumnType an interface or class instead of a enum. Then
> the
> > behaviour can simply be plugged in by the JDBC DataContext. I do like
> this
> > approach the best for many reasons, but it has the downside that it also
> > breaks backwards compatibility of the API and that there will no longer
> be
> > an enumerable and finite list of ColumnTypes. Maybe we could alleviate
> that
> > problem by ALSO having an enum with the typical implementations or
> > something like that.
> >
> > Maybe there are other solutions that I didn't think of.
> >
> > Regards,
> > Kasper
> >
>

Reply via email to