[gdal-dev] CSV driver inconsistent separator dealing to failures

2023-05-23 Thread Moises Calzado via gdal-dev
Hello everyone,

I'm trying to use ogr2ogr with a CSV file that uses semicolons as
separator, but there is a field that contain one comma. The issue that I'm
facing is that as it contains a comma, it originally takes the comma as
separator, so it's not parsed correctly.

However, when trying to open the CSV file with another application it works
like a charm, as the separator is correctly identified.

I've been having a look at the function that identifies the separator, and
it seems that if it finds two possible separators, it takes the comma as
the right one. To double-check that, I executed the command in debug mode,
and the following warning was shown:

CSV: Inconsistent separator. ';' and ',' found. Using ',' as default


In my humble opinion it makes sense to follow this approach if the
separator is not clear, but in this case the first CSV line contains
like 10 semicolons (the real separator) and just one comma. I believe that
the actual behaviour could be improved adding some way of checking the most
repeated separator. What do you think?

-- 
*Moises Calzado*

Support Engineer

+34671264286 | mcalz...@carto.com | CARTO 

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

2023-05-08 Thread Moises Calzado via gdal-dev
Hey Robert!

We already have something like head -n -2 in our pipeline, but the problem
is that the CSV that should be generated when omitting these two lines is
not valid, as it doesn't double quote the strings with line breaks.

Jukka, thanks so much for the alternative approach proposal! We'll check if
the GeoJSONseq output works as expected and we'll evaluate if the actual
process can be updated with this new approach.

Once more, thanks so much for all your help on that one. We really
appreciate it!

Regards.

El vie, 5 may 2023 a las 15:13, Robert Hewlett ()
escribió:

> Can something such as
> head -n -2
> Be part of the pipeline?
>
> The 3 text files are being combined into 1 stream.
>
>- Line 1 CRS/SRID from the .prj
>- Line 2 Types from the .cvst
>- Line 3 to the end from the .csv
>
> Which is great in some ways as the SRID does not go missing and header
> info is at the head.
>
> It is just that I found from line 3 to the end were well formed with the
> renamed geometry column but I am testing on Windows 10 with 3.6.
>
> I do not know if /vsizip/ as output is allowed or works i.e. all three
> text files as one streamed zip file then extract just the CSV file later in
> the process.
>
> Moving to a one file spatial format as mentioned above might help. It is
> just that a GeoCSV dataset is a combination of three files.
>
> Maybe a many-to-one-back-to-many-scenario might help.
>
> There are several multi-file spatial formats that would need to be zipped
> so that you could stream just one thing.
>
> I hope that makes sense.
>
>
> .
>
>
>
> On Fri, May 5, 2023 at 2:58 AM Rahkonen Jukka <
> jukka.rahko...@maanmittauslaitos.fi> wrote:
>
>> Hi,
>>
>>
>>
>> Have you considered to output GeoJSONseq
>> https://gdal.org/drivers/vector/geojsonseq.html instead of CSV, that for
>> my mind is a workaround as a geodata format. Maybe JSON could handle your
>> newlines by the same.
>>
>>
>>
>> -Jukka Rahkonen-
>>
>>
>>
>> *Lähettäjä:* gdal-dev  *Puolesta *Moises
>> Calzado via gdal-dev
>> *Lähetetty:* perjantai 5. toukokuuta 2023 12.32
>> *Vastaanottaja:* gdal-dev@lists.osgeo.org
>> *Aihe:* Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line
>> breaks inside columns
>>
>>
>>
>> Hi Even!
>>
>>
>>
>> I've just created the two issues:
>>
>> - https://github.com/OSGeo/gdal/issues/7699
>>
>> - https://github.com/OSGeo/gdal/issues/7700
>>
>>
>>
>> Robert, as I explained before, we need the `/vsistdout/` driver as we're
>> processing the file in streaming mode, so we can't save the result to the
>> storage.
>>
>> Unforteunately, the problem arises when using that driver.
>>
>>
>>
>> El jue, 4 may 2023 a las 15:39, Even Rouault ()
>> escribió:
>>
>> Moises,
>>
>> please fild 2 issues in the github issue tracker:
>>
>> - one about /vsistdout/ where .csvt and .prj content shouldn't be emitted
>>
>> - one about decoupling the layer GEOMETRY_NAME creation option with
>> CREATE_CSVT=YES
>>
>> Even
>>
>> Le 04/05/2023 à 13:58, Moises Calzado via gdal-dev a écrit :
>>
>> Hi Robert!
>>
>>
>>
>> I think that we're losing a bit the main issue that we reported, as in
>> fact the problem is related with line breaks in the output generated while
>> using /vsistdout and the CREATE_CSVT=YES option.
>>
>>
>>
>> Even pointed out that avoiding that flag it works as expected, but when
>> it's used the generated output is not okay as the "Fields with embedded
>> line breaks must be quoted" rule is not followed.
>>
>> IMHO although the generated output is not a CSV itself, we should be able
>> to delete the first two lines (projection info and types) and deal with the
>> rest of the content as a CSV.
>>
>>
>>
>> What we're doing is streaming the output of the /vsistdout driver to
>> another process that perform some steps with the resultant CSV. In all
>> cases it works correctly, as the output of the ogr2ogr execution is a valid
>> CSV when deleting the first two lines, but in the case reported in my first
>> email it's not.
>>
>> The CREATE_CSVT=YES option is mandatory for us as for the moment, it's
>> requires to use the GEOMETRY_NAME=*geom *one, so we don't have any
>> workaround.
>>
>>
>>
>> Just wanted to confirm if that's expected for you (generating an output

Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

2023-05-05 Thread Moises Calzado via gdal-dev
Hi Even!

I've just created the two issues:
- https://github.com/OSGeo/gdal/issues/7699
- https://github.com/OSGeo/gdal/issues/7700

Robert, as I explained before, we need the `/vsistdout/` driver as we're
processing the file in streaming mode, so we can't save the result to the
storage.
Unforteunately, the problem arises when using that driver.

El jue, 4 may 2023 a las 15:39, Even Rouault ()
escribió:

> Moises,
>
> please fild 2 issues in the github issue tracker:
>
> - one about /vsistdout/ where .csvt and .prj content shouldn't be emitted
>
> - one about decoupling the layer GEOMETRY_NAME creation option with
> CREATE_CSVT=YES
>
> Even
> Le 04/05/2023 à 13:58, Moises Calzado via gdal-dev a écrit :
>
> Hi Robert!
>
> I think that we're losing a bit the main issue that we reported, as in
> fact the problem is related with line breaks in the output generated while
> using /vsistdout and the CREATE_CSVT=YES option.
>
> Even pointed out that avoiding that flag it works as expected, but when
> it's used the generated output is not okay as the "Fields with embedded
> line breaks must be quoted" rule is not followed.
> IMHO although the generated output is not a CSV itself, we should be able
> to delete the first two lines (projection info and types) and deal with the
> rest of the content as a CSV.
>
> What we're doing is streaming the output of the /vsistdout driver to
> another process that perform some steps with the resultant CSV. In all
> cases it works correctly, as the output of the ogr2ogr execution is a valid
> CSV when deleting the first two lines, but in the case reported in my first
> email it's not.
> The CREATE_CSVT=YES option is mandatory for us as for the moment, it's
> requires to use the GEOMETRY_NAME=*geom *one, so we don't have any
> workaround.
>
> Just wanted to confirm if that's expected for you (generating an output
> that it's not a valid CSV in the end)!
>
> El mié, 3 may 2023 a las 21:05, Robert Hewlett ()
> escribió:
>
>> Hi,
>>
>> I just tested with : GDAL 3.6.4, released 2023/04/17
>>
>> Using the ogr2ogr as follows:
>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES
>> I get three files but no geometry
>>
>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
>> GEOMETRY=AS_WKT
>> I get three file with the geometry as WKT with the column name WKT
>>
>> *WKT*,id,poi_name,poi_types
>> "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
>> "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"
>>
>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=*geom*
>> I get three file with the geometry as WKT but the column called  *geom*
>> *geom*,id,poi_name,poi_types
>> "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
>> "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"
>>
>> What does
>> *ogr2ogr --version *
>> report back
>>
>>
>>
>> On Wed, May 3, 2023 at 9:38 AM Robert Hewlett  wrote:
>>
>>> Hi,
>>>
>>> Not to start a controversy but it feels like the standard hints at three
>>> files. Did the standard change?
>>>
>>> If it is three files which works for me in QGIS and geopandas i.e. data
>>> lands where it is suppose to, then more layer creations options are needed
>>> to handle the SRID/CRS
>>>
>>> CREATE_PRJ=YES/NO
>>> or -t_srs and/or -s_srs triggers the dot-prj file being created.
>>>
>>> Just saying 😊.
>>>
>>> In the meantime would a short python script help parse the one file into
>>> three?
>>>
>>>
>>> On Wed, May 3, 2023 at 9:16 AM Moises Calzado via gdal-dev <
>>> gdal-dev@lists.osgeo.org> wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> Yes, we're getting one with all the info!
>>>>
>>>> El mié, 3 may 2023 a las 18:14, Robert Hewlett ()
>>>> escribió:
>>>>
>>>>> Just to clarify, instead of getting three files you are getting one
>>>>> with all the info: types, projection, data?
>>>>>
>>>>> https://giswiki.hsr.ch/GeoCSV
>>>>>
>>>>> On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev <
>>>>> gdal-dev@lists.osgeo.org> wrote:
>>>>>
>>>>>&g

Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

2023-05-04 Thread Moises Calzado via gdal-dev
Hi Robert!

I think that we're losing a bit the main issue that we reported, as in fact
the problem is related with line breaks in the output generated while using
/vsistdout and the CREATE_CSVT=YES option.

Even pointed out that avoiding that flag it works as expected, but when
it's used the generated output is not okay as the "Fields with embedded
line breaks must be quoted" rule is not followed.
IMHO although the generated output is not a CSV itself, we should be able
to delete the first two lines (projection info and types) and deal with the
rest of the content as a CSV.

What we're doing is streaming the output of the /vsistdout driver to
another process that perform some steps with the resultant CSV. In all
cases it works correctly, as the output of the ogr2ogr execution is a valid
CSV when deleting the first two lines, but in the case reported in my first
email it's not.
The CREATE_CSVT=YES option is mandatory for us as for the moment, it's
requires to use the GEOMETRY_NAME=*geom *one, so we don't have any
workaround.

Just wanted to confirm if that's expected for you (generating an output
that it's not a valid CSV in the end)!

El mié, 3 may 2023 a las 21:05, Robert Hewlett ()
escribió:

> Hi,
>
> I just tested with : GDAL 3.6.4, released 2023/04/17
>
> Using the ogr2ogr as follows:
> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES
> I get three files but no geometry
>
> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
> GEOMETRY=AS_WKT
> I get three file with the geometry as WKT with the column name WKT
>
> *WKT*,id,poi_name,poi_types
> "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
> "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"
>
> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=*geom*
> I get three file with the geometry as WKT but the column called  *geom*
> *geom*,id,poi_name,poi_types
> "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
> "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"
>
> What does
> *ogr2ogr --version *
> report back
>
>
>
> On Wed, May 3, 2023 at 9:38 AM Robert Hewlett  wrote:
>
>> Hi,
>>
>> Not to start a controversy but it feels like the standard hints at three
>> files. Did the standard change?
>>
>> If it is three files which works for me in QGIS and geopandas i.e. data
>> lands where it is suppose to, then more layer creations options are needed
>> to handle the SRID/CRS
>>
>> CREATE_PRJ=YES/NO
>> or -t_srs and/or -s_srs triggers the dot-prj file being created.
>>
>> Just saying 😊.
>>
>> In the meantime would a short python script help parse the one file into
>> three?
>>
>>
>> On Wed, May 3, 2023 at 9:16 AM Moises Calzado via gdal-dev <
>> gdal-dev@lists.osgeo.org> wrote:
>>
>>> Hi Robert,
>>>
>>> Yes, we're getting one with all the info!
>>>
>>> El mié, 3 may 2023 a las 18:14, Robert Hewlett ()
>>> escribió:
>>>
>>>> Just to clarify, instead of getting three files you are getting one
>>>> with all the info: types, projection, data?
>>>>
>>>> https://giswiki.hsr.ch/GeoCSV
>>>>
>>>> On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev <
>>>> gdal-dev@lists.osgeo.org> wrote:
>>>>
>>>>> We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great if
>>>>> with that option we could use the GEOMETRY_NAME without using the
>>>>> CREATE_CSVT=YES option.
>>>>>
>>>>> Regarding emitting the .prj and .csvt in /vsistdout mode, that's why
>>>>> I'm saying that there is an issue while generating the resultant CSV.
>>>>> The way we see it is that when using the /vsistdout mode, the result
>>>>> is a CSV file with the .prj information in the first line, and the .csvt 
>>>>> in
>>>>> the second line. We're dealing with the result deleting the first two 
>>>>> lines
>>>>> and using the rest of the content as a CSV, which should be equal to the
>>>>> result obtained when using ogr2ogr without the CREATE_CSVT=YES option.
>>>>> Probably we're losing something, but as we see it, the generated CSV
>>>>> should be a valid one. Does that make sense?
>>>>>
>>>>> Thanks so much for your help!
>>>

Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

2023-05-03 Thread Moises Calzado via gdal-dev
Hi Robert,

Yes, we're getting one with all the info!

El mié, 3 may 2023 a las 18:14, Robert Hewlett ()
escribió:

> Just to clarify, instead of getting three files you are getting one with
> all the info: types, projection, data?
>
> https://giswiki.hsr.ch/GeoCSV
>
> On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev <
> gdal-dev@lists.osgeo.org> wrote:
>
>> We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great if
>> with that option we could use the GEOMETRY_NAME without using the
>> CREATE_CSVT=YES option.
>>
>> Regarding emitting the .prj and .csvt in /vsistdout mode, that's why I'm
>> saying that there is an issue while generating the resultant CSV.
>> The way we see it is that when using the /vsistdout mode, the result is a
>> CSV file with the .prj information in the first line, and the .csvt in the
>> second line. We're dealing with the result deleting the first two lines and
>> using the rest of the content as a CSV, which should be equal to the result
>> obtained when using ogr2ogr without the CREATE_CSVT=YES option.
>> Probably we're losing something, but as we see it, the generated CSV
>> should be a valid one. Does that make sense?
>>
>> Thanks so much for your help!
>>
>> El mié, 3 may 2023 a las 15:10, Robert Hewlett ()
>> escribió:
>>
>>> The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with QGIS
>>> And geopandas. The column name that I use in the CSV is usually geom and
>>> WKT shows up in the CSVT file which seems to be a one line file that hints
>>> at the data types in the CSV file.
>>>
>>> I hope that makes sense.
>>>
>>> CSVT
>>> Integer, Integer,WKT
>>>
>>> CSV
>>> line_id,point_id,geom
>>> 1,1,"POINT(1000 1000)"
>>>
>>> PRJ
>>> EPSG:26910
>>>
>>>
>>>
>>>
>>> On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev <
>>> gdal-dev@lists.osgeo.org> wrote:
>>>
>>>> Hi Even,
>>>>
>>>> Thanks so much for taking a look into that one!
>>>>
>>>> I have one doubt regarding the CSVT content, as we're not really using
>>>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>>>> as can be checked in the CSV driver documentation:
>>>>
>>>>
>>>>>-
>>>>>
>>>>>GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to 
>>>>> WKT
>>>>>
>>>>> We really need this flag as we are processing files that contain
>>>> geometries with different column names, and we always want the same
>>>> geometry name in the generated output. Are we losing something when using
>>>> that flag to avoid this problem?
>>>> In my humble opinion, generating an invalid CSV when using the -lco
>>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>> strings containing line breaks can't be quoted.
>>>>
>>>> Could you please shed some light on this?
>>>>
>>>> Looking forward to your reply,
>>>> Regards.
>>>>
>>>> El mié, 3 may 2023 a las 14:00, Even Rouault (<
>>>> even.roua...@spatialys.com>) escribió:
>>>>
>>>>> you didn't post to the list
>>>>> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>>>>
>>>>> Hi Even,
>>>>>
>>>>> Thanks so much for taking a look into that one!
>>>>>
>>>>> I have one doubt regarding the CSVT content, as we're not really using
>>>>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>>>>> as can be checked in the CSV driver documentation:
>>>>>
>>>>>
>>>>>>-
>>>>>>
>>>>>>GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>>column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to 
>>>>>> WKT
>>>>>>
>>>>>> We really need this flag as we are processing files that contain
>>>>> geometries with different column names, and we always want the same
>>>>> geometry name in the generated output. Are we losing something when using
>

Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

2023-05-03 Thread Moises Calzado via gdal-dev
We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great if with
that option we could use the GEOMETRY_NAME without using the
CREATE_CSVT=YES option.

Regarding emitting the .prj and .csvt in /vsistdout mode, that's why I'm
saying that there is an issue while generating the resultant CSV.
The way we see it is that when using the /vsistdout mode, the result is a
CSV file with the .prj information in the first line, and the .csvt in the
second line. We're dealing with the result deleting the first two lines and
using the rest of the content as a CSV, which should be equal to the result
obtained when using ogr2ogr without the CREATE_CSVT=YES option.
Probably we're losing something, but as we see it, the generated CSV should
be a valid one. Does that make sense?

Thanks so much for your help!

El mié, 3 may 2023 a las 15:10, Robert Hewlett ()
escribió:

> The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with QGIS
> And geopandas. The column name that I use in the CSV is usually geom and
> WKT shows up in the CSVT file which seems to be a one line file that hints
> at the data types in the CSV file.
>
> I hope that makes sense.
>
> CSVT
> Integer, Integer,WKT
>
> CSV
> line_id,point_id,geom
> 1,1,"POINT(1000 1000)"
>
> PRJ
> EPSG:26910
>
>
>
>
> On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev <
> gdal-dev@lists.osgeo.org> wrote:
>
>> Hi Even,
>>
>> Thanks so much for taking a look into that one!
>>
>> I have one doubt regarding the CSVT content, as we're not really using
>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>> as can be checked in the CSV driver documentation:
>>
>>
>>>-
>>>
>>>GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>
>>> We really need this flag as we are processing files that contain
>> geometries with different column names, and we always want the same
>> geometry name in the generated output. Are we losing something when using
>> that flag to avoid this problem?
>> In my humble opinion, generating an invalid CSV when using the -lco
>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>> strings containing line breaks can't be quoted.
>>
>> Could you please shed some light on this?
>>
>> Looking forward to your reply,
>> Regards.
>>
>> El mié, 3 may 2023 a las 14:00, Even Rouault ()
>> escribió:
>>
>>> you didn't post to the list
>>> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>>
>>> Hi Even,
>>>
>>> Thanks so much for taking a look into that one!
>>>
>>> I have one doubt regarding the CSVT content, as we're not really using
>>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>>> as can be checked in the CSV driver documentation:
>>>
>>>
>>>>-
>>>>
>>>>GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to 
>>>> WKT
>>>>
>>>> We really need this flag as we are processing files that contain
>>> geometries with different column names, and we always want the same
>>> geometry name in the generated output. Are we losing something when using
>>> that flag to avoid this problem?
>>> In my humble opinion, generating an invalid CSV when using the -lco
>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>> strings containing line breaks can't be quoted.
>>>
>>> Could you please shed some light on this?
>>>
>>> Looking forward to your reply,
>>> Regards.
>>>
>>> El sáb, 29 abr 2023 a las 15:44, Even Rouault (<
>>> even.roua...@spatialys.com>) escribió:
>>>
>>>> Moises,
>>>>
>>>> as far as I can see with your example, the CSV driver behaves
>>>> "properly" in reading and writing of field values with line breaks.
>>>>
>>>> It follows the "Fields with embedded line breaks must be quoted" rule
>>>> of https://en.wikipedia.org/wiki/Comma-separated_values
>>>>
>>>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>>>
>>>> $ cat out.csv
>>>> id,descriptio
>>>> "1",This is my third row
>>>> "2"

Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

2023-05-03 Thread Moises Calzado via gdal-dev
Hi Even,

Thanks so much for taking a look into that one!

I have one doubt regarding the CSVT content, as we're not really using it,
but it's required when using the GEOMETRY_NAME layer creation option, as
can be checked in the CSV driver documentation:


>-
>
>GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry column.
>Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>
> We really need this flag as we are processing files that contain
geometries with different column names, and we always want the same
geometry name in the generated output. Are we losing something when using
that flag to avoid this problem?
In my humble opinion, generating an invalid CSV when using the -lco
CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
strings containing line breaks can't be quoted.

Could you please shed some light on this?

Looking forward to your reply,
Regards.

El mié, 3 may 2023 a las 14:00, Even Rouault ()
escribió:

> you didn't post to the list
> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>
> Hi Even,
>
> Thanks so much for taking a look into that one!
>
> I have one doubt regarding the CSVT content, as we're not really using it,
> but it's required when using the GEOMETRY_NAME layer creation option, as
> can be checked in the CSV driver documentation:
>
>
>>-
>>
>>GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry column.
>>Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>
>> We really need this flag as we are processing files that contain
> geometries with different column names, and we always want the same
> geometry name in the generated output. Are we losing something when using
> that flag to avoid this problem?
> In my humble opinion, generating an invalid CSV when using the -lco
> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
> strings containing line breaks can't be quoted.
>
> Could you please shed some light on this?
>
> Looking forward to your reply,
> Regards.
>
> El sáb, 29 abr 2023 a las 15:44, Even Rouault ()
> escribió:
>
>> Moises,
>>
>> as far as I can see with your example, the CSV driver behaves "properly"
>> in reading and writing of field values with line breaks.
>>
>> It follows the "Fields with embedded line breaks must be quoted" rule of
>> https://en.wikipedia.org/wiki/Comma-separated_values
>>
>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>
>> $ cat out.csv
>> id,descriptio
>> "1",This is my third row
>> "2","this is
>> my string
>> "
>> "3",This is my third row
>>
>> $ ogrinfo out.csv -al
>> INFO: Open of `out.csv'
>>   using driver `CSV' successful.
>>
>> Layer name: out
>> Geometry: None
>> Feature Count: 3
>> Layer SRS WKT:
>> (unknown)
>> id: String (0.0)
>> descriptio: String (0.0)
>> OGRFeature(out):1
>>   id (String) = 1
>>   descriptio (String) = This is my third row
>>
>> OGRFeature(out):2
>>   id (String) = 2
>>   descriptio (String) = this is
>> my string
>>
>>
>> OGRFeature(out):3
>>   id (String) = 3
>>   descriptio (String) = This is my third row
>>
>> But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is going
>> to result in an invalid CSV file which will mix both the .csvt and .csv
>> content
>>
>> Even
>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>
>> Hello!
>>
>> We're trying to convert a Shapefile into a CSV using ogr2ogr and we're
>> having some issues while dealing with some columns that contain line breaks
>> inside their values. If we have a line with the following string, ogr2ogr
>> detects that the line break is a new line and it returns two lines.
>>
>> "this is my \n value"
>>
>>
>> That's the command that we're executing:
>>
>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/ /vsizip/shapefile.zip
>>> -simplify 0.1 -dim XY -t_srs EPSG:4326 -lco GEOMETRY=AS_WKT -lco
>>> GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>>>
>>
>> Is this an expected behaviour, or is there any way to avoid this?
>> Sharing an example Shapefile so that you can try to reproduce that
>> behaviour:
>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>
>> Thanks so much in advance,
>> Regards.
>>
>> --
>> *Moises Calzado*
>&

[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

2023-04-24 Thread Moises Calzado via gdal-dev
Hello!

We're trying to convert a Shapefile into a CSV using ogr2ogr and we're
having some issues while dealing with some columns that contain line breaks
inside their values. If we have a line with the following string, ogr2ogr
detects that the line break is a new line and it returns two lines.

"this is my \n value"


That's the command that we're executing:

ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/ /vsizip/shapefile.zip
> -simplify 0.1 -dim XY -t_srs EPSG:4326 -lco GEOMETRY=AS_WKT -lco
> GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>

Is this an expected behaviour, or is there any way to avoid this?
Sharing an example Shapefile so that you can try to reproduce that
behaviour:
https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing

Thanks so much in advance,
Regards.

-- 
*Moises Calzado*

Support Engineer

+34671264286 | mcalz...@carto.com | CARTO 

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


[gdal-dev] Latest GDAL version not transforming lat/lng into geom column

2022-11-21 Thread Moises Calzado via gdal-dev
Hello everyone,

We've just updated to the latest GDAL version (v3.6.0) and it seems that
something is not working correctly when trying to obtain a geom column from
a CSV containing latitudes and longitudes.

This is the command that is being used:

ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/ CSV:my_file.csv
> -simplify 0.1 -dim XY -t_srs EPSG:4326 -lco GEOMETRY=AS_WKT -lco
> GEOMETRY_NAME=geom -lco CREATE_CSVT=YES -s_srs EPSG:4326 -oo
> KEEP_GEOM_COLUMNS=NO -oo
> X_POSSIBLE_NAMES=point_longitude,longitude,longitud,lon,Lon,Longitude,longitudedecimal,decimallongitude,decimallong,lng,long,Lng
> -oo
> Y_POSSIBLE_NAMES=latitude,latitud,lati,lat,Latitude,decimallat,decimallatitude,latitudedecimal,point_latitude
> -oo
> GEOM_POSSIBLE_NAMES=geom,Geom,geometry,the_geom,wkt,wkb,wkt_geometry,wkb_geometry
> -oo EMPTY_STRING_AS_NULL=YES -oo QUOTED_FIELDS_AS_STRING=NO -lco
> PRECISION=NO -oo AUTODETECT_SIZE_LIMIT=50


The file that is being processed contains a column called lat and another
one called lon, and when I execute the same process on a docker container
running the version 3.5.1 of GDAL it works like a charm. We've also tried
to execute this process on the 3.5.3 version, and it also fails. Is that
expected?

Looking forward to your response,
Regards!

-- 
*Moises Calzado*

Support Engineer

(US) +1 917 463 3232 | (ES) +34 911 165 823 | mcalz...@carto.com

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


[gdal-dev] ogr2ogr error: ERROR 1: Maximum number of characters allowed reached.

2022-09-20 Thread Moises Calzado via gdal-dev
Hello everyone,

We're facing some issues working with ogr2ogr in the 3.5.1 version, as
we've found a dataset that provokes the following error during the
execution of the following command:

ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
> CSV:munic_s_anonymized.csv -simplify 0.1 -dim XY -t_srs EPSG:4326 -lco
> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > test.csv


The error that is thrown by ogr2ogr is the following one:

ERROR 1: Maximum number of characters allowed reached.


And when it appears, the process stops processing data. We've been able to
launch the process without any issue on GDAL 3.2.2, so we don't know if
something has changes or if it's an unexpected behaviour. Have you seen a
similar error in the latest versions of GDAL?

If you want to reproduce it there is a link to the file

that provoked this error.

Thanks so much in advance,
Kind regards.
-- 
*Moises Calzado*

Support Engineer

(US) +1 917 463 3232 | (ES) +34 911 165 823 | mcalz...@carto.com

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


[gdal-dev] (no subject)

2022-09-19 Thread Moises Calzado via gdal-dev
Hello everyone,

We're performing some tests with ogrinfo trying to read GPKG files, and
we're facing some issues executing the command with remote GPKG files in
WAL mode. As can be checked in the following command output, ogr fails
while reading the file content:

ERROR: Error: Command failed: ogrinfo /vsicurl/MY_FILE_URL has GPKG
> application_id, but non conformant file extension ERROR 1: unable to open
> database file: this file is a WAL-enabled database. It cannot be opened
> because it is presumably read-only or in a read-only directory. ERROR 1:
> Only read-only mode is supported for /vsicurl


What is strange here is that the same file works correctly if we execute
the command using the downloaded file.

Is there any way we can execute the command using a remote file?

Thanks so much in advance,
Kind regards!
-- 
*Moises Calzado*

Support Engineer

(US) +1 917 463 3232 | (ES) +34 911 165 823 | mcalz...@carto.com

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


[gdal-dev] Ogr2ogr taking too much time to process a MapInfo TAB file

2022-07-27 Thread Moises Calzado via gdal-dev
Hi everyone!

We're using ogr2ogr to convert MapInfo TAB files into CSV format using the
following command:

ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/ /vsizip/onLDU.zip  -oo
> AUTODETECT_TYPE=YES -lco CREATE_CSVT=YES > test_2.csv


The file weights ≈200 MB and the process is taking too much time to finish
(almost 20 min), so we don't know if we're doing something wrong regarding
the command that we launch.

[image: Screenshot 2022-07-20 at 12.55.14.png]

However, if we launch the same command against the .tab file instead of
using the vsizip virtual file system, it takes less than 30 seconds to
complete.

Have you ever seen something like this? Do you know if it's expected that
it takes too much time to process this kind of files, or we're doing
something wrong?

Thanks so much for your help in advance,
Regards!
-- 
*Moises Calzado*

Support Engineer

(US) +1 917 463 3232 | (ES) +34 911 165 823 | mcalz...@carto.com

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


[gdal-dev] Ogr2ogr issue with big integers

2022-07-18 Thread Moises Calzado via gdal-dev
 test.csv

Hello everyone!

I'm dealing with some issues with ogr2ogr trying to convert my data to a
CSV file guessing the data types. I'm using the following command:

ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/ CSV:test.csv -oo
AUTODETECT_TYPE=YES -lco CREATE_CSVT=YES > test.csv

When I add the AUTODETECT_TYPE=YES flag, ogr2ogr is applying some weird
transformations on the "uprn" column. If you check the generated output and
the original file, you'll notice that some integers have changed into a
negative number. Do you know if I'm doing something wrong, or if it's a
bug?

Thanks so much for your help,
Regards!



-- 
*Moises Calzado*

Support Engineer

(US) +1 917 463 3232 | (ES) +34 911 165 823 | mcalz...@carto.com

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev