Hi Régis

Interesting thoughts.  I've renamed this thread and copied it  in to 
qgis-developer, as I think it is worth getting broader input  on this.  Hope 
you don't mind my putting your email there....

If I've understood the main points of your suggestion are:

1) The delimited text provider and the GUI should be able to read a VRT/CSVT 
file if one is present to determine field types etc

2) The delimited text provider GUI should be able to save settings to a VRT/CSVT

3) The user should be able to explicitly set data types for each column

This is inline with some of my thinking in this.  I was planning to add some 
way of saving settings as a "delimited text file type", that could then be 
selected when adding a new text layer.  At the moment (ie in master, not 1.8)  
the plugin remembers settings based on file extension, but that doesn't provide 
enough granularity for me.  This would be analogous to saving styles in QGIS.  
I hadn't decided where to store the settings yet (ie whether in a file, or in 
the QGIS settings, or... )

Like you, I am also not very happy with the way that the provider determines 
field types by scanning the file when it loads it.  So if you put different 
data in the file, and reload it into QGIS, then the data types may change.

I hadn't thought about using VRT or CSVT files and I really like the idea.

As a really simple first step, which would not be a major refactoring, the 
provider could check for a CSVT file when it loads as CSV and use it to 
determine field types - this would not require any UI or API changes at all.  
That on its own could be very useful.  The only difficulty I can foresee with 
this is how to manage files with names other than ".csv".  Should it just look 
for a file matching the name of the input file with a "t" at the end?  Or 
should it only use this options for files that are named ".csv".  Or should it 
look for ".csvt" whatever the name of the file.  Other than that this really 
isn't much work, and I'd be keen to implement it.  I guess the simplest 
approach would be the first options, looking for a file named the same as the 
data file but with a "t".  If it exists and can be interpreted, then it will be 
used to define field types.  This would make it really easy to manage when 
creating data, and would be compatible with GDAL/OGR.

The VRT file is much more work as you suggest.  The mapping between VRT and the 
CSV options is not complete.  So the options around delimiters, skipped lines, 
regular expressions, and so on are not available for the VRT file.  Conversely 
several of the VRT options don't apply within QGIS.  So there is quite a lot of 
work in specifying how this should work.  Also it would entail a major 
reworking in terms of how the file is opened (ie you would select the VRT file 
in the GUI, which would then have to identify the CSV file, and also handle 
politely VRT files which did not define CSV files, but which defined other data 
source types.  This seems a lot of work and would end up re-engineering a lot 
of what is already in OGR.   So (even as I'm writing this)  I'm becoming less 
clear that this is a good approach.

Returning to the CSVT idea - once the provider can use the CSVT  then there 
remains the question of what GUI/API changes should be made to support it.   
Two thoughts come to mind immediately.

One is, should the CSVT idea be extended to support the other metadata 
information required in setting up the file, such as the delimiter, etc.  The 
OGR specification for CSVT just defines the field types in the first line.  I 
don't know if it would ignore subsequent lines, in which case additional 
metadata could go there and still be compatible with OGR usage.  Or should 
another metadata file (eg .metadata, .qgs, .dlt, ...) be used to hold all the 
information specifying how the file should be used (sounds really messy).  But 
it would be really nice to be able to just select a file and have all these 
options automatically populated if the metadata file existed.

The other thought is around your suggestion of writing the CSVT/metadata file.  
The main extra work involved in this is the user interface for defining field 
types.

The dialog box is already quite busy, but I guess a simple approach would be 
just to add a row to the preview box under the column headings with a field 
type selector for each column, and values 
"Auto,Text,Integer,Real,Date,Time,DateTime", or something like that.  The field 
types could then be passed through to the provider in the datasource URI (ie 
with a parameter such as "fieldtypes=text,text,integer,...").   This also 
doesn't sound like too much work.

Once this is done it would be simple to add a "save settings to metadata file" 
type button to the GUI, which could write the CSVT/metadata file.

This would create one more tricky question, of how to handle conflicts between 
metadata read in a CSVT/metadata file and that in the datasource URI.

Enough rambling.  I expect it will be a couple of weeks before I can consider 
this much more (though I may consider handling the CSVT file sooner).

Cheers
Chris


> -----Original Message-----
> From: HAUBOURG [mailto:regis.haubo...@eau-adour-garonne.fr]
> Sent: Thursday, 23 May 2013 6:45 a.m.
> To: Chris Crook
> Subject: RE : Delimited text debug
>
> Thanks for your feedback Chris.
>
> I have been thinking of it all day, and got to the following observations and
> conclusions:
>
> 1- there are two concurrent ways to open csv in qgis: ogr native and your
> plugin.
> ogr gdal offers some features like vrt (enabling geometry columns, xy, yx
> columns) and csvt  (basic types for columns) that remain unused in qgis
> (unless your using them for your code).
>
> 2- users have no way to create point from attribute datas, except using your
> plugin. csv export and fields types can be a pain.
>
> 3- there is no way to change a data type on the fly, so user has to do again
> the import, and sometime is trapped if no ETL or database is available or
> understood.
>
> From a user point of view, I think we should do two things:
>
> A: unify import for data sources to avoid the two different entries
>  - merge all import tools based on your approach, with a previz gui (choose
> encoding, skip lines...)
>  - enable others options for all text based files (field delimiter, text 
> delimiter,
> decimal delimiter, trim fields.. )
>  - WKT or XY chooser for geometry fields (for all data sources: native
> geometry / no geometry/ text fields)
>  - field type chooser with automatic guess (gdal does it)
>  - option to save a vrt / csvt so that a user can reopen easily the data 
> without
> redoing all the import stuff.
> This is a big refactoring of vector layer add dialog. Nathan add some mockups
> for this that could do.
>
>  B: add a vector tool to create geometry from any attribute data (xy, wkt.. ) 
> of
> any loaded data source. users that imported data could then spatialize data in
> a second step (like Mapinfo does)
>
> In my corp users really need that, so I probably will fund that. Do you have
> some feedback on that?
> Régis
> ________________________________________
> De : Chris Crook [ccr...@linz.govt.nz]
> Date d'envoi : mercredi 22 mai 2013 19:54 À : HAUBOURG Objet : RE:
> Delimited text debug
>
> Hi Régis
>
> It could be a useful improvement - basically to allow setting types of 
> columns.
> Associated with this I'd like to add date types, which would require some
> explicit definition by the user.  This will not make 2.0, but certainly worth
> doing for the next release.
>
> As a workaround for the moment could you add an extra row of dummy data
> with a non-numeric value in the key column.  The provider will then treat it 
> as
> a text column and the joining should work ok.
>
> Cheers
> Chris
> ________________________________________
> From: HAUBOURG [regis.haubo...@eau-adour-garonne.fr]
> Sent: 22 May 2013 23:12
> To: Chris Crook
> Subject: RE: Delimited text debug
>
> Hi Chris,
> I'm facing a problem here. We have most of our administrative area
> identified with a text key, but composed only of number ("09 ", "31").
> I have no way to choose to interpret text delimiters, and then, data is
> corrupted (09 becomes 9) and no way to join data with geographic layer..
> Is that a possible improvement to your plugin? I will file a ticket if needed.
> Cheers,
> Régis
>
> This message contains information, which is confidential and may be subject
> to legal privilege. If you are not the intended recipient, you must not 
> peruse,
> use, disseminate, distribute or copy this message. If you have received this
> message in error, please notify us immediately (Phone 0800 665 463 or
> i...@linz.govt.nz) and destroy the original message. LINZ accepts no
> responsibility for changes to this email, or for any attachments, after its
> transmission from LINZ. Thank You.


This message contains information, which is confidential and may be subject to 
legal privilege. If you are not the intended recipient, you must not peruse, 
use, disseminate, distribute or copy this message. If you have received this 
message in error, please notify us immediately (Phone 0800 665 463 or 
i...@linz.govt.nz) and destroy the original message. LINZ accepts no 
responsibility for changes to this email, or for any attachments, after its 
transmission from LINZ. Thank You.
_______________________________________________
Qgis-developer mailing list
Qgis-developer@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/qgis-developer

Reply via email to