Michael,

Your comments were very helpful. Please see my responses below.

You wrote: "I think that a light-weight feature class or FeatureOnDemand is
a good
solution, as well as a FeatureCache."

I'm glad you think so. I like the term "FeatureOnDemand". Do you mind if I
use it as the name of the light-weight feature class?

You wrote: "I already tested Agile's scalable shapefile driver, and I'm
currently
implementing something similar for GeoConcept format(a commercial gis).
It can save a lot of memory (but as you guess, is not very good for
performance unless we find very well designed solutions)
I've not yet seen how kosmo implemented their scalable shapefile driver,
but I'll have to, because it is not only scalable, it is also writable !"

The FeatureCache will be writable as well. The advantage over the scalable
shapefile driver that is used by Agile, UDig and (maybe) Kosmo is that we'll
be able to use the FeatureCache with any data source that can provide
Features. For example, after I get the FeatureCache working with GeoTools
Shapefile drivers I want to get it working for AutoDesk's DXF format as
well. The other benefit is that we can support storage of data not currently
supported in the ESRI Shapefile format if we choose to do so in the future.

You wrote: "What must the in-memory representation of the light-weight
feature
include ?

The minimum is an identifier and a file adress for disk-access (unless
you store data in a database)"

Almost, but not quite. I was only going to store a numeric identifier for
the Feature, like a serial number, which I would probably store in an
integer or a long. The only other item I would store is perhaps a string
with the name of the FeatureCache containing the Feature. I think this is
about as light weight as you can get.

You wrote: "but imo the bounding box has also to be in-memory for
performance
reasons (just wonder if it is worth trying to store the bb in a
structure smaller than 4 doubles)"

I didn't think about this. Could you please tell me why you think it will be
important to keep the bounding box of the feature in memory? Is this for
rendering purposes? Remember we will need to put every feature into memory
for rendering anyways, so I don't know if this will save us anything. Unless
the bounding box is used for another frequent operation.

You wrote: "Another question you ask is about data format. Sigle project is
exploring GML format storage for direct access. I think you can also
keep the data in the original file format (this is the way scalable
shapefile works, and the way I am exploring with geoconcept format). "

Please see my comments on using the original file format above. I had
originally intended on using my XML parser and a storage format similiar to
GML, but I was reading about the foolishness of the Java-to-XML-to-Java
conversion. XML is a great way to transfer information between systems and
different programming languages, but it isn't the most efficient way to
serialize objects. If we can export the features out of OpenJUMP in an XML
based format anyways it might not make sense to store them in the
FeatureCache as XML. That leaves me with the choice of a custom binary
format or Java's default serialization format. One is a lot simpler to
implement, but it suffers from some versioning problems. This may not be a
big issue, as I don't think we change the Feature interface a lot, but it is
one that should be considered at least. (I don't really want to come up with
a custom binary storage format, but this may be the best solution...)

After looking at your tests of JTS reading in WKT and WKB formats I can see
that using text as the storage format really isn't a good option. The binary
storage format is so much faster!

I'll have to give this problem a lot more thought. Perhaps I can get a
temporary FeatureCache system running with Java's standard object
serialization, and work on the custom binary format after that.

I'll have to take a look at WKB format. Maybe we can base a binary format
for Feature attribute values on a similar system.

Thanks again for your comments. They were very helpful.

Thanks to Erwan as well.

The Sunburned Surveyor




On 3/29/07, Michaël Michaud <[EMAIL PROTECTED]> wrote:

Hi sunburned,

I think that a light-weight feature class or FeatureOnDemand is a good
solution, as well as a FeatureCache.
I already tested Agile's scalable shapefile driver, and I'm currently
implementing something similar for GeoConcept format(a commercial gis).
It can save a lot of memory (but as you guess, is not very good for
performance unless we find very well designed solutions)
I've not yet seen how kosmo implemented their scalable shapefile driver,
but I'll have to, because it is not only scalable, it is also writable !
Some questions are :
- what must the in-memory representation of the light-weight feature
include ?
the minimum is an identifier and a file adress for disk-access (unless
you store data in a database)
but imo the bounding box has also to be in-memory for performance
reasons (just wonder if it is worth trying to store the bb in a
structure smaller than 4 doubles)
- another question you ask is about data format. Sigle project is
exploring GML format storage for direct access. I think you can also
keep the data in the original file format (this is the way scalable
shapefile works, and the way I am exploring with geoconcept format). But
storing data in jump's own format may be useful to solve performance
issues, or to solve the data access problem in a more independant way.
For this issue, I made some tests to compare wkb and wkt reading (and
also writing). Sorry, I did not test serializing which, I think, is not
very performant. Here are my results with jts 1.8 (every test made with
my personal laptop computer) :

Reading 100 Complex WKT Polygon (about 7000 points each)    26590267
bytes    15.073 sec
Reading 1 000 000 WKT Points sequentially
   64489511 bytes    47.874 sec

Reading 100 Complex WKB Polygon (about 7000 points each)    26590267
bytes    1.313 sec
Reading 1 000 000 WKB Points sequentially
64489511 bytes    2.542 sec

Some more tests for database access (binary geometry)
postgreSQL, sequential access :    10 000 pts 0.3 sec
postgreSQL, random access :       10 000 pts 7 sec
H2, sequential or random access : 10 000 pts 0.4 sec

Michaël

Sunburned Surveyor a écrit :

> I've been working on a solution to the problem of working with very
> large datasets in OpenJUMP at home the past couple of weeks. (For
> those of you that don't know, OpenJUMP reads all features in from a
> data source into memory. This isn't a problem until you start working
> with some very large datasets. For example, OpenJUMP runs out of
> memory before it can open the shapefile with all of the parcels in my
> county. The size limit of the data source OpenJUMP can work with is
> limited by the RAM of the computer OpenJUMP is running on.) I'd like
> to give a brief explanation of how this system will work, and then ask
> for some suggestions on an aspect of the design.
>
>
>
> This system uses a very light-weight in-memory representation of the
> Feature class. (This is required because portions of OpenJUMP's code
> requires the ability to manipulate individual features or all the
> features in a feature collection "in-memeory".) Object's of this
> light-weight Feature Class are really a façade and forward all method
> calls to a FeatureCache object. A FeatureCache is an implementation of
> the FeatureCollection interface that actually manages data behind the
> light-weight Feature objects.
>
>
>
> The FeatureCache maintains a "buffer". In this buffer it stores
> in-memory representations of regular OpenJUMP Feature objects. This
> buffer will only grow to a maximum size that can be set by the user
> and based on the balance between speed/performance and memory usage.
> When a method call is made to the light-weight Feature object it is
> forwarded to the FeatureCache. The FeatureCache passes this call to
> the regular Feature object if it is in the buffer. If it is not in the
> buffer the Feature object is created in memory from information in
> permanent storage or "on-disk". The method call is then processed and
> the newly created Feature is placed in the buffer. If the buffer is
> already at its limit the oldest Feature in the Buffer is stored back
> in permanent memory and removed from the buffer.
>
>
>
> There should be no major distinction between Features and a
> FeatureCollection implemented by a FeatureCache and normal Features
> and FeatureCollections that are stored entirely in memory. The only
> significant difference will be the speed of operations and rendering.
> This will be slower with this system than it is with Features and
> FeatureCollections stored entirely in memory. However, it will make it
> possible to work with very large datasets.
>
>
>
> Here is the part of the system that I would like to get some
> suggestions on. I need to decide on a storage format for the features
> placed in permanent memory, or on disk. I think I have 3 choices.
>
>
>
> [1] Java's Standard Object Serialization Format
>
> [2] A custom binary storage format.
>
> [3] A text based format.
>
>
>
> I believe the first two formats will be much quicker than the third. I
> don't really think the second format is something I want to do,
> because I think cooking up a custom binary format will be a real pain
> in the neck. So I need to decide between the first format listed and
> the third format listed.
>
>
>
> If I use a text-based format external tools will be able to easily
> work with the FeatureCache, and I won't have to worry about versioning
> issues. It will also be slower. If I use Java's standard object
> serialization format I'll have better performance, but I'll have to
> worry about versioning issues that might come up if we change the
> interface definition for the Feature interface. It will also make it
> difficult for external tools, especially those that aren't written in
> Java, to work with the data in the FeatureCache.
>
>
>
> I'd like to know what storage format the other developers would
> recommend and why.
>
> Thanks,
>
> The Sunburned Surveyor
>
>------------------------------------------------------------------------
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share
your
>opinions on IT & business topics through brief surveys-and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Jump-pilot-devel mailing list
>Jump-pilot-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel
>
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel

Reply via email to