Jody, yes I read the contribution policy, no questions.

Andrea, I guess they say "Code Page" because that's the terminology used in early times in the context of dBASE, originally developed for the CP/M operating system. Traditionally, the dbf-File contains a special byte indicating one of a few available Code Pages.

Now, surprise surprise, I found that the proposed feature already has been implemented. After setting

   System.setProperty(ShapefileDataStoreFactory.ENABLE_CPG_SWITCH, true);


and leaving param DBFCHARSET unset, the charset is read from the *.cpg-file if present.


But I have more wishes. The next one is: Some data providers provide gzipped shapefiles (*.shp.gz/*.shx.gz/*.dbf.gz/*.prj.gz instead of the normal *.shp/*.shx/*.dbf/*.prj). One such data provider is tomtom.com, providing street map data. To support reading such data, 20-30 lines in 3 or so source files have to be changed/added. I would add unit tests including test-data.

What do you think?

Regards
Burkhard


Am 15.09.2023 um 04:56 schrieb Andrea Aime:
That sounds great, but at the same time I'm a bit confused. The ESRI specification <https://desktop.arcgis.com/en/arcmap/10.3/manage-data/shapefiles/shapefile-file-extensions.htm> claims that the file contains a "codepage <https://en.wikipedia.org/wiki/Code_page>", while Java needs a Charset <https://en.wikipedia.org/wiki/Code_page>. And yet, if I look at the cpg files I have locally, the contents are either UTF-8 or Windows-1251,
so indeed, charsets.

Can you shed some light on this?

Cheers
Andrea

On Thu, Sep 14, 2023 at 9:46 PM Burkhard Strauss <servi...@strauss.eng.br> wrote:

    Some providers of ESRI Shapefile data provide an additional
    *.cpg-file
    containing the character set name for string-fields in the
    *.dbf-file.
    One such provider is HERE/NavStreets street map data.

    Currently the API-user has to specify a Charset to the
    ShapefileDataStoreFactory to ensure properly read string values. The
    application has to ask the application-user to look up the Charset in
    the *.cpg-file and copy or type the name. That's rather awkward.

    I prepared a solution which adds 15 lines to
    ShapefileDataStoreFactory
    plus a unit-test and test-data. If a file named <my_shapefile>.cpg is
    present beside <my_shapefile>.dbf and the other files, and the
    Charset
    can be determined from the file without error, a possibly present
    factory parameter is ignored and overridden by the Charset found
    in the
    file.

    What do you think?

    Regards
    Burkhard


    _______________________________________________
    GeoTools-Devel mailing list
    GeoTools-Devel@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/geotools-devel



--

Regards,

Andrea Aime

==GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us <http://bit.ly/gs-services-us>for more information.==Ing. Andrea Aime @geowolfTechnical Lead

GeoSolutions Groupphone: +39 0584 962313

fax:     +39 0584 1660272

mob:   +39  339 8844549


https://www.geosolutionsgroup.com/ <https://www.geosolutionsgroup.com/>

http://twitter.com/geosolutions_it <http://twitter.com/geosolutions_it>

-------------------------------------------------------


Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia.This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail

_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to