Dear All,

I've been trying a few solutions to efficiently convert GeoJSON into a
shapefile without having to store all features in memory. I'm using
GeoTools 9.2.

The problem is not so much in how to stream the JSON but how to
efficiently write the features into the shapefile. I use
FeatureJSON#streamFeatureCollection to obtain an iterator. After some
googling, I found 3 different ways of writing a shapefile, namely:

1. Repeatedly calling FeatureStore#addFeatures with a collection
containing say 1000 features, within a transaction.
      -----
      ListFeatureCollection coll = new ListFeatureCollection(type, features);
      Transaction transaction = new DefaultTransaction("create");
      featureStore.setTransaction(transaction);
      try {
        featureStore.addFeatures(coll);
        transaction.commit();
      } catch (IOException e) {
        transaction.rollback();
        throw new IllegalStateException(
            "Could not write some features to shapefile. Aborting process", e);
      } finally {
        transaction.close();
      }
      -----


This option is extremely slow. By profiling a few runs, I noticed that
about 50% of CPU time is spent on the method
ContentFeatureStore#getWriterAppend, presumably in order to reach the
end of the file before each transaction commit.

2. Obtaining an append writer directly from ShapefileDataStore, and
write 1000 features at a time within a transaction.

This options suffers from the same problems as number one.

3. Obtaining a feature writer from ShapefileDataStore, and write one
feature at a time using Transaction.AUTO_COMMIT.

     -----
     FeatureWriter<SimpleFeatureType, SimpleFeature> writer = shpDataStore
        .getFeatureWriter(shpDataStore.getTypeNames()[0],
            Transaction.AUTO_COMMIT);

     while (jsonIt.hasNext()) {

      SimpleFeature feature = jsonIt.next();
      SimpleFeature toWrite = writer.next();
      for (int i = 0; i < toWrite.getType().getAttributeCount(); i++) {
        String name = toWrite.getType().getDescriptor(i).getLocalName();
        toWrite.setAttribute(name, feature.getAttribute(name));
      }
      writer.write();
    }
    writer.close();
    ----


Option 3 is the fastest, but I feel there would a way of efficiently
adding a greater number of features at a time to the shapefile within
a transaction. On the other hand, a previous comment in this lists
noted:

> The above would work for mid-sized data transafers, for massive ones against
> databases it's better to adopt some sort of batching to avoid having a single
> transaction with one million inserts, e.g., insert 1000, commit the 
> transaction,
> insert another 1000, and so on.
> This would work better against databases and against WFS servers,
> but not against shapefiles, which instead work better with the massive 
> insert...
> to each his own.

Does this mean that the most efficient way of writing to a shapefile
is having all features in memory, rather than being able to append
features?
I appreciate if someone could suggest a better way of achieving this
or point to any documentation that would help me.

Best regards,

Will

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
GeoTools-GT2-Users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-gt2-users

Reply via email to