Moises,

please fild 2 issues in the github issue tracker:

- one about /vsistdout/ where .csvt and .prj content shouldn't be emitted

- one about decoupling the layer GEOMETRY_NAME creation option with CREATE_CSVT=YES

Even

Le 04/05/2023 à 13:58, Moises Calzado via gdal-dev a écrit :
Hi Robert!

I think that we're losing a bit the main issue that we reported, as in fact the problem is related with line breaks in the output generated while using /vsistdout and the CREATE_CSVT=YES option.

Even pointed out that avoiding that flag it works as expected, but when it's used the generated output is not okay as the "Fields with embedded line breaks must be quoted" rule is not followed. IMHO although the generated output is not a CSV itself, we should be able to delete the first two lines (projection info and types) and deal with the rest of the content as a CSV.

What we're doing is streaming the output of the /vsistdout driver to another process that perform some steps with the resultant CSV. In all cases it works correctly, as the output of the ogr2ogr execution is a valid CSV when deleting the first two lines, but in the case reported in my first email it's not. The CREATE_CSVT=YES option is mandatory for us as for the moment, it's requires to use the GEOMETRY_NAME=*geom *one, so we don't have any workaround.

Just wanted to confirm if that's expected for you (generating an output that it's not a valid CSV in the end)!

El mié, 3 may 2023 a las 21:05, Robert Hewlett (<rob.h...@gmail.com>) escribió:

    Hi,

    I just tested with : GDAL 3.6.4, released 2023/04/17

    Using the ogr2ogr as follows:
    ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES
    I get three files but no geometry

    ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
    GEOMETRY=AS_WKT
    I get three file with the geometry as WKT with the column name WKT

    *WKT*,id,poi_name,poi_types
    "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
    "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional
    Park,"1"

    ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
    GEOMETRY=AS_WKT -lco GEOMETRY_NAME=*geom*
    I get three file with the geometry as WKT but the column called *geom*
    *geom*,id,poi_name,poi_types
    "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
    "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional
    Park,"1"

    What does
    *ogr2ogr --version *
    report back



    On Wed, May 3, 2023 at 9:38 AM Robert Hewlett <rob.h...@gmail.com>
    wrote:

        Hi,

        Not to start a controversy but it feels like the standard
        hints at three files. Did the standard change?

        If it is three files which works for me in QGIS and geopandas
        i.e. data lands where it is suppose to, then more layer
        creations options are needed to handle the SRID/CRS

        CREATE_PRJ=YES/NO
        or -t_srs and/or -s_srs triggers the dot-prj file being created.

        Just saying 😊.

        In the meantime would a short python script help parse the one
        file into three?


        On Wed, May 3, 2023 at 9:16 AM Moises Calzado via gdal-dev
        <gdal-dev@lists.osgeo.org> wrote:

            Hi Robert,

            Yes, we're getting one with all the info!

            El mié, 3 may 2023 a las 18:14, Robert Hewlett
            (<rob.h...@gmail.com>) escribió:

                Just to clarify, instead of getting three files you
                are getting one with all the info: types, projection,
                data?

                https://giswiki.hsr.ch/GeoCSV

                On Wed, May 3, 2023 at 8:57 AM Moises Calzado via
                gdal-dev <gdal-dev@lists.osgeo.org> wrote:

                    We're also specifying the GEOM_POSSIBLE_NAMES, so
                    it would be great if with that option we could use
                    the GEOMETRY_NAME without using the
                    CREATE_CSVT=YES option.

                    Regarding emitting the .prj and .csvt in
                    /vsistdout mode, that's why I'm saying that there
                    is an issue while generating the resultant CSV.
                    The way we see it is that when using the
                    /vsistdout mode, the result is a CSV file with the
                    .prj information in the first line, and the .csvt
                    in the second line. We're dealing with the result
                    deleting the first two lines and using the rest of
                    the content as a CSV, which should be equal to the
                    result obtained when using ogr2ogr without the
                    CREATE_CSVT=YES option.
                    Probably we're losing something, but as we see it,
                    the generated CSV should be a valid one. Does that
                    make sense?

                    Thanks so much for your help!

                    El mié, 3 may 2023 a las 15:10, Robert Hewlett
                    (<rob.h...@gmail.com>) escribió:

                        The .CSVT and .PRJ help to make a proper
                        geocsv dataset. Helps with QGIS And geopandas.
                        The column name that I use in the CSV is
                        usually geom and WKT shows up in the CSVT file
                        which seems to be a one line file that hints
                        at the data types in the CSV file.

                        I hope that makes sense.

                        CSVT
                        Integer, Integer,WKT

                        CSV
                        line_id,point_id,geom
                        1,1,"POINT(1000 1000)"

                        PRJ
                        EPSG:26910




                        On Wed, May 3, 2023, 05:23 Moises Calzado via
                        gdal-dev <gdal-dev@lists.osgeo.org> wrote:

                            Hi Even,

                            Thanks so much for taking a look into that
                            one!

                            I have one doubt regarding the CSVT
                            content, as we're not really using it, but
                            it's required when using the GEOMETRY_NAME
                            layer creation option, as can be checked
                            in the CSV driver documentation:

                                 *

                                    GEOMETRY_NAME=name (Starting with
                                    GDAL 2.1): Name of geometry
                                    column. Only used if
                                    GEOMETRY=AS_WKT and
                                    CREATE_CSVT=YES. Defaults to WKT

                            We really need this flag as we are
                            processing files that contain geometries
                            with different column names, and we always
                            want the same geometry name in the
                            generated output. Are we losing something
                            when using that flag to avoid this problem?
                            In my humble opinion, generating an
                            invalid CSV when using the -lco
                            CREATE_CSVT=YES looks like a bug for me,
                            as I can't see the reason why strings
                            containing line breaks can't be quoted.

                            Could you please shed some light on this?

                            Looking forward to your reply,
                            Regards.

                            El mié, 3 may 2023 a las 14:00, Even
                            Rouault (<even.roua...@spatialys.com>)
                            escribió:

                                you didn't post to the list

                                Le 03/05/2023 à 13:49, Moises Calzado
                                a écrit :
                                Hi Even,

                                Thanks so much for taking a look into
                                that one!

                                I have one doubt regarding the CSVT
                                content, as we're not really using
                                it, but it's required when using the
                                GEOMETRY_NAME layer creation option,
                                as can be checked in the CSV driver
                                documentation:

                                     *

                                        GEOMETRY_NAME=name (Starting
                                        with GDAL 2.1): Name of
                                        geometry column. Only used if
                                        GEOMETRY=AS_WKT and
                                        CREATE_CSVT=YES. Defaults to WKT

                                We really need this flag as we are
                                processing files that contain
                                geometries with different column
                                names, and we always want the same
                                geometry name in the generated
                                output. Are we losing something when
                                using that flag to avoid this problem?
                                In my humble opinion, generating an
                                invalid CSV when using the -lco
                                CREATE_CSVT=YES looks like a bug for
                                me, as I can't see the reason why
                                strings containing line breaks can't
                                be quoted.

                                Could you please shed some light on this?

                                Looking forward to your reply,
                                Regards.

                                El sáb, 29 abr 2023 a las 15:44, Even
                                Rouault
                                (<even.roua...@spatialys.com>) escribió:

                                    Moises,

                                    as far as I can see with your
                                    example, the CSV driver behaves
                                    "properly" in reading and writing
                                    of field values with line breaks.

                                    It follows the "Fields with
                                    embedded line breaks must be
                                    quoted" rule of
                                    
https://en.wikipedia.org/wiki/Comma-separated_values

                                    $ ogr2ogr out.csv
                                    /vsizip/dataframe.zip

                                    $ cat out.csv
                                    id,descriptio
                                    "1",This is my third row
                                    "2","this is
                                    my string
                                    "
                                    "3",This is my third row

                                    $ ogrinfo out.csv -al
                                    INFO: Open of `out.csv'
                                          using driver `CSV' successful.

                                    Layer name: out
                                    Geometry: None
                                    Feature Count: 3
                                    Layer SRS WKT:
                                    (unknown)
                                    id: String (0.0)
                                    descriptio: String (0.0)
                                    OGRFeature(out):1
                                      id (String) = 1
                                      descriptio (String) = This is
                                    my third row

                                    OGRFeature(out):2
                                      id (String) = 2
                                      descriptio (String) = this is
                                    my string


                                    OGRFeature(out):3
                                      id (String) = 3
                                      descriptio (String) = This is
                                    my third row

                                    But in your example using
                                    /vsistdout/ and -lco
                                    CREATE_CSVT=YES is going to
                                    result in an invalid CSV file
                                    which will mix both the .csvt and
                                    .csv content

                                    Even

                                    Le 24/04/2023 à 13:34, Moises
                                    Calzado via gdal-dev a écrit :
                                    Hello!

                                    We're trying to convert a
                                    Shapefile into a CSV using
                                    ogr2ogr and we're having some
                                    issues while dealing with some
                                    columns that contain line breaks
                                    inside their values. If we have
                                    a line with the following
                                    string, ogr2ogr detects that the
                                    line break is a new line and it
                                    returns two lines.

                                        "this is my \n value"


                                    That's the command that we're
                                    executing:

                                        ogr2ogr -f CSV -skipfailures
                                        -makevalid /vsistdout/
                                        /vsizip/shapefile.zip
                                        -simplify 0.00001 -dim XY
                                        -t_srs EPSG:4326 -lco
                                        GEOMETRY=AS_WKT -lco
                                        GEOMETRY_NAME=geom -lco
                                        CREATE_CSVT=YES > result.csv


                                    Is this an expected behaviour,
                                    or is there any way to avoid this?
                                    Sharing an example Shapefile so
                                    that you can try to reproduce
                                    that behaviour:
                                    
https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing

                                    Thanks so much in advance,
                                    Regards.

-- *Moises Calzado*

                                    Support Engineer

                                    +34671264286 |
                                    mcalz...@carto.com | CARTO
                                    <https://www.carto.com/>

                                    
<https://spatial-data-science-conference.com/2023/london/>


                                    
_______________________________________________
                                    gdal-dev mailing list
                                    gdal-dev@lists.osgeo.org
                                    
https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- http://www.spatialys.com
                                    My software is free, but my time generally 
not.



-- *Moises Calzado*

                                Support Engineer

                                +34671264286 | mcalz...@carto.com |
                                CARTO <https://www.carto.com/>

                                
<https://spatial-data-science-conference.com/2023/london/>


-- http://www.spatialys.com
                                My software is free, but my time generally not.



-- *Moises Calzado*

                            Support Engineer

                            +34671264286 | mcalz...@carto.com | CARTO
                            <https://www.carto.com/>

                            
<https://spatial-data-science-conference.com/2023/london/>

                            _______________________________________________
                            gdal-dev mailing list
                            gdal-dev@lists.osgeo.org
                            https://lists.osgeo.org/mailman/listinfo/gdal-dev

                        _______________________________________________
                        gdal-dev mailing list
                        gdal-dev@lists.osgeo.org
                        https://lists.osgeo.org/mailman/listinfo/gdal-dev



-- *Moises Calzado*

                    Support Engineer

                    +34671264286 | mcalz...@carto.com | CARTO
                    <https://www.carto.com/>

                    <https://spatial-data-science-conference.com/2023/london/>

                    _______________________________________________
                    gdal-dev mailing list
                    gdal-dev@lists.osgeo.org
                    https://lists.osgeo.org/mailman/listinfo/gdal-dev

                _______________________________________________
                gdal-dev mailing list
                gdal-dev@lists.osgeo.org
                https://lists.osgeo.org/mailman/listinfo/gdal-dev



-- *Moises Calzado*

            Support Engineer

            +34671264286 | mcalz...@carto.com | CARTO
            <https://www.carto.com/>

            <https://spatial-data-science-conference.com/2023/london/>
            _______________________________________________
            gdal-dev mailing list
            gdal-dev@lists.osgeo.org
            https://lists.osgeo.org/mailman/listinfo/gdal-dev

    _______________________________________________
    gdal-dev mailing list
    gdal-dev@lists.osgeo.org
    https://lists.osgeo.org/mailman/listinfo/gdal-dev



--
*Moises Calzado*

Support Engineer

+34671264286 | mcalz...@carto.com | CARTO <https://www.carto.com/>

<https://spatial-data-science-conference.com/2023/london/>

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to