Hello Marc

Thanks for the reply.

Le 22/12/14 17:21, Marc Le Bihan a écrit :
> Currently the Database class only has for public methods :
>    public Charset getCharset()
>    public int getRecordCount()
>    public ArrayList<FieldDescriptor> getFieldsDescriptor()
>    public File getFile()
>    public int getRowNum()

I noticed that the 'recordCount' field is also public. Why having both a
public field and a public 'getRecordCount()' method? Shouldn't the field
be private?

Should we rename getRecordCount() as getRowCount() for making the
relationship with getRowNum() clearer? It would also be consistent with
JDBC terminology (e.g. ResultSet.getRow()). Maybe we should also rename
getRowNum() as getCurrentRow(), otherwise it sound a little bit like a
synonymous of getRowCount().

I wonder why getFieldsDescriptor() returns an ArrayList implementation
instead of the List interface? Also, isn't FieldDescriptor internal
mechanics? What would be the use case for exposing it to the users?

About the getFile() method: couldn't the file be a URL? Maybe a Java 7
Path object would be more generic, but we would have to omit that method
on the JDK6 branch.


> and to allow it to be used from ShapeFile class, these ones :
>    public void close()
>    public boolean isClosed()
>    public void loadRowIntoFeature(Feature feature)
>
> and for being used from DBFRecordBasedResultSet class :
>    public HashMap<String, Object> readNextRowAsObjects()

Why returning the HashMap implementation instead than the Map interface?


> Eventually, the close, isClosed, loadRowIntoFeature, and
> readNextRowAsObjects will disapear because our refactoring will lead
> to a more stable/nice state.
> Until this time, I suggest only to put a @deprecated annotation on
> these four methods and on the @Deprecated javadoc mention that they be
> removed soon during next refactoring and that they will not be seen in
> 0.6 version.

What about the following alternative?

  * Do not expose Database at all, but expose only the JDBC interfaces
    instead. I think that we can expose most of the API through the JDBC
    interface. Even the FieldDescriptor, which could be exposed through
    the java.sql.DatabaseMetadata interface.
  * If exposing only the JDBC interfaces is not doable, define Database
    as an abstract class with only the API that we want to keep, and
    move the implementation in the internal packages.


> For FieldDescriptor class these methods exist :
>    public int getDecimalCount()
>    public int getLength()
>    public String getName()
>    public DataType getType()
>
>    But I see, you are right the members being public. I didn't
> refactor that when I saw them.
>    Let me try to do a change on them this morning.

Thanks. But ideally, I think it would be nice to hide this class and
expose its information though the JDBC DatabaseMetadata interface instead.


> "Various numerical codes internal to the DBase format" :
> codepage values are not visible, else I've done a mistake ?

The DbaseVersion, DbaseHeaderBytes and DbaseRecordBytes are public
fields in my checkout, while I think they are of use only for the
Database class (I don't see why a user would want to know the length of
the DBase file header, unless he is implementing his own reader). The
DbaseLastUpdate field is an array of bytes, while I think end-users
would rather expect a Date object.


> I think that if they were I wouldn't be so trouble some. The DBase
> Header could be public (even if it will not be), because it's the
> Dbase format, and if a developper needed these metadata coming from
> the end of the 80's he could find them by our API).

I think that if a user want to go in such low-level details, he would
probably implement his own shapefile reader. I would rather suggest a
very conservative approach regarding the public API, on the basis that
it is very easy to add new API in the future if we realize that there is
really a need for it, but impossible (if we want to preserve
compatibility) to remove them.

Joshua Blosh (the author of "Effective Java") has a nice talk on "How to
design a good API, and why it matter" (available on Youtube). This is a
one hour talk, but he insists quite strongly on a single golden rule. He
said that if we have only one thing to remember from his whole talk,
this is that one: "in case of doubt, leave it out". The need to give to
users an access to such internal details as the length of the DBase file
header seems quite uncertain to me, so applying Joshua's principle I
would like that we leave that out for now. It can be added later if
really needed.


> MappedByteBuffer is not public but has package level, and won't be
> shown to users. But I understand what you mean.

Database has a public getByteBuffer() method on my checkout... Maybe my
checkout is outdated, I will update later.


> For Shapefile object, I have refactored nothing :
>    Do you want me to try to put all members private and put only some
> public getters on them ?

That would be nice if you can do that. But the public getters would not
necessarily match the fields. For example I don't think that we should
put getXMin(), getYMin(), getXMax(), getYMax() methods. Instead, we
should have a single getGeographicBoundingBox() method.

However maybe even the above-cited getGeographicBoundingBox() method
should be omitted for now, and a more generic getMetadata() method
declared instead. A getMetadata() implementation proposal is available
in a patch attached to one of the JIRA tasks. The reason is that in many
cases, the user would not even know that he is handling a shapefile. He
will not have (and do not want to have) access to our Shapefile class.
We will instead be using a more abstract class (DataStore) which could
be Shapefile, NetCDF or many other data formats. So the API that matter
the most is a very generic API - everything specific to Shapefile should
be avoided as much as possible.

    Martin

Reply via email to