[sc-issues] [Issue 85946] CSV import - auto-detect d elimiters

2010-03-24 Thread jnothman
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=85946





--- Additional comments from jnoth...@openoffice.org Thu Mar 25 05:49:12 
+ 2010 ---
I think the first approach should be:
1) For Text Import on Open, use the last used settings for the given filename! 
I.e. keep a cached mapping with the document history. This may be impractical.
2) Otherwise, detect.

The detection of text delimiter should perhaps precede column delimiters, 
because there is much greater variability in the latter, and it is hard to 
determine without having first stripped out quoted/escaped portions.

Assuming there is no quoting (i.e. it has been stripped, or it does not exist 
in the first place), one approach to finding column delimiters would be to find 
all non-alphanumeric characters which have constant frequency = 1 on all lines 
of the file.

Or if we can't strip quoted text, find all non-alphanumeric, non-quote 
characters which have a constant minimum value = 1 on each line.

Heuristics or a machine learning approach might then select which result is 
most appropriate in case of conflict.


Alternatively, brute force it: determine which (common?) pairs of delimiters 
give the text integrity (same number of cols per line), and then use heuristics 
to decide (e.g. prefer quoted over unquoted; tab over semicolon, comma or 
space; more columns over fewer?).

If OOo could acquire a collection of test text spreadsheets it might be helpful!

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@sc.openoffice.org
For additional commands, e-mail: issues-h...@sc.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[sc-issues] [Issue 85946] CSV import - auto-detect d elimiters

2009-12-11 Thread ali_b
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=85946





--- Additional comments from al...@openoffice.org Fri Dec 11 11:28:23 + 
2009 ---
Some thoughts about this issue:

Autodetect delimiters would be great.
So when the import dialog opens, the detected delimiters should be already set.
So all I have to do is clicking ok after I checked whether the autodetection was
correct.

Also there should be a commandline option to provide all necessary csv import
options, so in this case no dialog would appear.

The last part is very important for automating some processes in companies.

Related issues are:
72981, 97416 and others

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@sc.openoffice.org
For additional commands, e-mail: issues-h...@sc.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[sc-issues] [Issue 85946] CSV import - auto-detect d elimiters

2008-02-07 Thread fabianvss
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=85946
 Issue #|85946
 Summary|CSV import - auto-detect delimiters
   Component|Spreadsheet
 Version|1.0.0
Platform|All
 URL|
  OS/Version|All
  Status|UNCONFIRMED
   Status whiteboard|
Keywords|
  Resolution|
  Issue type|ENHANCEMENT
Priority|P4
Subcomponent|open-import
 Assigned to|spreadsheet
 Reported by|fabianvss





--- Additional comments from [EMAIL PROTECTED] Thu Feb  7 17:00:46 + 
2008 ---
The text import dialog which appears when opening a CSV file, should analyze the
text and automatically select most likely delimiter options.

By default, a semicolon is selected for separating fields and a double quote for
text. But often, f.e. in regions where decimal separators are commas instead of
dots, semicolons are used for separating the fields and it is very annoying to
change this every time when opening a file.

It should do the following:

- search for one tabulator
- if found then use tabulators as field delimiter
- else
  - count all commas and semicolons
  - if semicolons  commas then use semicolons as field delimiter
  - else use commas as field delimiter

Now, just counting text delimiters isn't sufficient as the following example of
a CSV row shows:
I'm here,I'm there,I'm nowhere, but aware

- count all double and single quote pairs around fields (field delimiter or
start of line on the left, and field delimiter or end of line on the right)
(a raw search could lead initiating the complex search when 2 occurences are
found, otherwise the number of pairs is treated as 0)
- if single quote pairs  double quote pairs then use single quotes as text
delimiter
- else use double quotes as text delimiter

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]