[Geotools-devel] Re: Geotools mif

Luca Sigfrido Percich Mon, 26 Sep 2005 03:40:12 -0700

Hi Jukka!

Please, can you FW my reply to the udig list, as I'm not a member of
it? Meanwhile I CC the reply to Geotools-devel as the discussion
might be of interest. Thanx!!!

For up to date code and documentation, please refer to:

http://docs.codehaus.org/display/GEOTOOLS/MIFDataStore

and

http://svn.geotools.org/geotools/trunk/gt/plugin/mif/

On 26 Sep 2005 at 11:28, Jukka Sirviö wrote:

> we could start from separating the reading and writing processes, this
> way we can share the testing and developing burden more easily? What's
> your opinion?
> I suggest that the MIFFile -class is splitted into a bit more
> convenient pieces like: 1) MIFFileReader 2) MIFFileWriter
>
> probaply also
> 3) MIFFeatureWriter
> 4) MIFFeatureReader

Yes, there's already a certain degree of separation, take a look at
the MIFile private inner classes Reader (extends FeatureReader) and
Writer (Extends FeatureWriter)... I've left them inside MIFFile
'cause they share some pieces of code, and I think they don't need to
be derived or accessed form outside MIFFile. The problem is that if
in the future a new type of geometry or geometry attributes are
defined for MIF format, the class has to be modified, so no "plugin"
or "extension" model has been provided.

MIFFile has been designed for being used standalone also (i.e.
outside a MIFDataStore).

> JUnit testcases could be 'splitted' also, for example when running the
> tests the writer and reader tests could be easily separated. I ran the
> current  test's with normal mif in our purposes (30 - 100 Mb). I faced
> performance issues probaply mainly related that the writing and
> reading is done simultaneously during the run of JUnit  -testclass.

Ok.

> How about the Styling and text part of Region -element? Coodinates are
> straight-forward to handle but how about the example below? Any ideas?

Well, we've already discussed this issue (but unfortunately nobody
seemed to be so interested in it), anyway the idea is that keeping
style along with geographic data is a bad practice. It's nearly the
same problem I faced when somebody asked me to support text
objects... text objects are no database concept at all, they are
merely a "map drawing" issue. Supporting the style at the row level
for a geometry has the same value of supporting font style for a char
field at the row level... font style, as object style, must derive
from formalized object properties which are accessible from the
database where the object is stored (i.e. all the polygons of
buildings belonging to this class are to be filled in red).

Please refer to:

http://jira.codehaus.org/browse/GEOT-653

On one hand, MapInfo allows for "bad practices" (IMHO). (Another bad
practice is supporting mixed geometry types in the same table, again
IMHO). On the other, style and text and other MapInfo features don't
find a corresponding entity in the JTS/GeoTools world. In the
following example, what we could do is similar to the approach for
supporting TEXT objects, i.e. creating one or more MIF_PROPERTY
String fields in which store the information we read so that we can
write it back when needed. But what happens when a new feature is
created? And how interpret the MIF style definition for rendering
purposes?

> Region  1
>   19
> 3605482.86 6908238.08
> 3605461.4 6908257.01
> 3605451.29 6908270.91
>     Pen (3,2,8388736)
>     Brush (1,0,16777215)
>     Center 3605495.49 6908255.75

My final proposal - I'm a DataBase/Developer guy, not a cartographer
;o) is to ignore all the style info from MIF format, and concentrate
in a best suited approach which could be automatically GENERATE a SLD
DESCRIPTOR from a MapInfo WORKSPACE FILE (.WOR).

The definition of styles for thematic maps in a workspace files uses
the same style objects used in mif files, we have to implement
classes which parse the MapInfo style descriptors (Symbol, Pen and
Brush) in GeoTools renderers styles.

> Third, I did not find the the support for multipolygons?

MultiPolygons are supported. MapInfo describes in the same way a
linear ring, a polygon with a hole and a multipolygon: they are
simply lists of linear rings. The MIFFile.Reader builds a Polygon or
MultiPolygon according to the preferences (for example it can cast
all the simple polygons to Multi) and to what it finds in MIF file.

> I suppose that current approach is well when examined the architect of
> it. The Tokenizer classes are well separated from the parsing. The
> actual String to be handled is always asked through
> MIFFileTokenizer.getToken() -method with several types of 'tokenizer'
> (or delimiter) parameter passsed by (empty or  ',' or ' ' , '(', or
> ')'). That is of course flexible way to do things but does it affect
> perfomance when parsing large files?

As I told you, I didn't check for performance. :o(

Sure the best performance improvement could be achieved modifying the
MIFFile.Reader.readMIFCoordinate() method, as MIF format uses to keep
a coordinate pair on each row. The problem is when things don't work
as expected in the MIF file, i.e.:

x1, y1, x2, y2

or

x1, y1,
x2, y2

are treated correctly by the current
MIFFile.Reader.readMIFCoordinate() method, but they probably wouldn't
work in the new version. Keep in mind that MIF files can also be
generated by third party software which may or may not write them
according to precise formatting rules.

> of stuff we will get from mif -file? The basic delimiter is stored in
> mif file's header (as you have used), but I think we also know, or at
> least we can easily determine, the rest of the things in mif file
> without every time going through the variety of if -checkings
> (MIFStringTokenizer.getToken(char, boolean, boolean))?

While it's no use improving the parsing of header (few rows), it's
worth thinking about the MID files (Feature non-geometric
attributes).
The problem of parsing the MID file lies in String fields which are
always delimited by double quotes, and where double quote chars are
escaped by doubling inside the string... like "A string containing a
""quoted string"""... substring would not work here, not in a
straightforward way I mean.

The different getToken methods allow you to skip parsing of double
quotes, and in fact use a plain substring() call if a non-quoted
string is expected.

Give a look at the MIFValueSetter objects, which are currently used
for mapping MIF values into Feature attributes... they could also be
used for getting data from the MIF stream... so we could think of a
MIFValueSetter.getToken() method... this could give better
performances.

> This kind of approach could be straigh-forward way to handle the
> parsing: - loop through the whole file one line at a time - parse line
> with StringTokenizer into smaller pieces - determine the type of the
> line from first token and do the necessary actions for the line based
> on that linetype

This is what is currently done, apart that you don't parse the whole
file but one feature at a time according to the FeatureReader /
FeatureWriter model. The MIFStringTokenizer.putToken() method has
been added for allowing a "look forward", useful if you want to read
all the style stuff (as I did for Text objects) instead of simply
skipping it when you read the next feature.

> Yours:
> Jukka

Cheers

Sig

--
Luca Sigfrido Percich    ([EMAIL PROTECTED])
Agenzia Milanese Mobilità e Ambiente s.r.l. (http://www.ama-mi.it)
Direzione Sistemi Informativi e Modellistica
Via Beccaria, 19 - 20122 Milano - tel. +39 02 884.67.262

-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

[Geotools-devel] Re: Geotools mif

Reply via email to