Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread John Engelhart
On Thu, Oct 1, 2009 at 7:09 PM, Colin Howarth  wrote:

> Hi,
>
> I'd like my app ('tis progressing steadily :-) to read some simple tabular
> data. The original file is from Excel, but it's not too much bother to
> convert it to CSV.
>
> Searching the developer docs gets me
>
> "gestaltGraphicsVersion"
>
> which is not good, but googling "cocoa csv" works very well. The first
> couple of thousand hits seem relevant :-)
>
> Before I go through the 550,000 hits (some of them quite old) dare I ask if
> there's one Right Way (TM) to parse this sort of data?
>

One possible solution I haven't seen mentioned yet is to use RegexKitLite (
http://downloads.sourceforge.net/regexkit/RegexKitLite-3.1.tar.bz2).  The
documentation has an example that parses CSV strings (
http://regexkit.sourceforge.net/RegexKitLite/index.html#RegexKitLiteCookbook_PatternMatchingRecipes_TextFiles_ParsingCSVData).
 When I did some simple benchmarks parsing CSV files using the
documentations example code, it significantly outperformed other solutions
that I tried.  One of those was the MacResearch example that's already been
mentioned, but this was some time ago, so things may have changed and your
milage may vary. It only takes ~17 lines of code to parse CSV strings in to
a NSArray (lines) of NSArrays ('columns') using RegexKitLite, so making
modifications to fit your particular needs is very doable.
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread I. Savant

On Oct 2, 2009, at 4:19 PM, Alex Kac wrote:

Yes! In any case, I'm sure libcsv is more powerful and correct, but  
the category there worked for my purposes working with several cloud  
services.


  You need only address quoted fields, line breaks within fields,  
respect character encodings, and line ending variations (CR/LF/CR+LF).  
Cocoa handles two of these four automagically (in most cases for  
character encodings, all cases for line endings). The parsing category  
on macresearch.org adds the other two.


  I'm fascinated by the NSMutableCharacterSet efficiency issue but  
the category seems to work quickly on impressively-sized files  
(100,000 records with 100 fields, with reasonably long text in the  
fields), despite its memory inefficiencies. I'll be giving it a test  
with Mike's suggested modification to see how much better it runs,  
though. :-)


--
I.S.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread Alex Kac
Yes! In any case, I'm sure libcsv is more powerful and correct, but  
the category there worked for my purposes working with several cloud  
services.


On Oct 2, 2009, at 3:16 PM, I. Savant wrote:


On Oct 2, 2009, at 4:15 PM, Alex Kac wrote:

Here is something I use that has worked for me fairly well. I found  
it either on this list or somewhere on the web, so sharing back to  
the list.


http://pastie.org/639863


 This appears to be the code listing from the article I mentioned on MacResearch.org 
. :-)


--
I.S.




Alex Kac - President and Founder
Web Information Solutions, Inc.

"It is useless for sheep to pass a resolution in favor of  
vegetarianism while wolves remain of a different opinion."

-- William Randolph Inge





___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread I. Savant

On Oct 2, 2009, at 4:15 PM, Alex Kac wrote:

Here is something I use that has worked for me fairly well. I found  
it either on this list or somewhere on the web, so sharing back to  
the list.


http://pastie.org/639863


  This appears to be the code listing from the article I mentioned on MacResearch.org 
. :-)


--
I.S.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread I. Savant

On Oct 2, 2009, at 4:03 PM, Colin Howarth wrote:

CSV isn't *that* hard to parse, once you know about quotes and NLs  
inside cells.


  ... and encodings and line endings. Don't forget how much goodness  
Cocoa gives you automagically. :-)


--
I.S.




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread Alex Kac
Here is something I use that has worked for me fairly well. I found it  
either on this list or somewhere on the web, so sharing back to the  
list.


http://pastie.org/639863


On Oct 2, 2009, at 3:03 PM, Colin Howarth wrote:

Yes, thanks Mike (and all others). My particular table is only  
around 100 x 100 cells, so Drew's code is fine.


Regarding the efficiency points raised in these posts, perhaps   
Stephen's pointer

libcsv is a potential option.   http://sourceforge.net/projects/libcsv/


is a good idea. CSV isn't *that* hard to parse, once you know about  
quotes and NLs inside cells. Maybe a Cocoa wrapper enabling one to  
*use* libcsv in a Cocoa-ish way would be a good idea. You don't  
really need to create an instance with


[MyLetter letterWithLetter: (chr) aLetter encodedUsingEncoding:  
(NSStringEncoding *) someEncoding possiblyUsingSomeFont: (NSFont *)  
aFont]


or

[MyLetter createALetterByReadingFromString:(NSString *)  
theStringImLookingAt whichMightBeEncodedUsingEncoding:(NSEncoding *)  
anEncoding  
butIfItIsntEncodedOrIfICantFigureOutTheEncodingLookAtThisError:  
(NSError ***) error andAssignANiceFont: (NSFont *)font  
andOfCourseASize: (NSPointSize *) makeAPointSizeNumberUsingANumber 
( pointSize)]


for each letter in the file and then

[parser lookAtTheLetterIJustRead: (MyLetter *) theLetter  
andDecideIfItBelongsToTheMutableClassOfSeparatorCharacters:  
(NSMutableArray *) ...


...

:-)   (sorry)

I mean, one doesn't *have to* implement the parser in Objective-C,  
does one?


Maybe I'll have a go (I need the practice) when I've Finished (TM)  
my current project...



-- colin


On 2 Oct, 2009, at 15:49, Adam R. Maxwell wrote:


On Oct 2, 2009, at 6:10 AM, I. Savant wrote:


On Oct 2, 2009, at 7:42 AM, Mike Abdullah wrote:

While using this code in an experimental project I found the app  
was routinely using 500+ MB of RAM. When measured with  
Instruments I realised that every time you use a character set  
for string scanning, Foundation internally copies it, presumably  
to ensure it has an immutable object to work with. As a result,  
potentially thousands of copies are being created, resulting in  
either:


A) Outrunning the garbage collector
B) Spending far more time allocating and deallocating character  
sets than doing the actual scanning


Easiest solution is just to make a single copy yourself early on.  
Also the docs for NSMutableCharacterSet do point out that it's  
inefficient but don't really offer any detail.



Very good to know, thanks, Mike. You might want to post a comment  
on the macresearch.org page so Drew can adjust the example.


I've also run into major performance problems with -[NSScanner  
scanCharactersFromSet:], which creates an inverted, autoreleased  
character set each time it's called.  This blows up autorelease  
pools (and inverting character sets isn't fast either).  Filed as rdar://problem/4652388.


My workaround for this is generally to invert the set myself and  
use scanUpToCharactersFromSet:, but that could be hurting this  
example as well.


Do you happen to remember what percentage of difference you were  
able to achieve? It's fine if not - it's not too difficult to set  
up a test. :-)


In my case I was running out of address space and crashing, which  
is pretty easy to detect :).



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/colin%40howarth.de

This email sent to co...@howarth.de


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/alex%40webis.net

This email sent to a...@webis.net


Alex Kac - President and Founder
Web Information Solutions, Inc.

"You cannot build a reputation on what you intend to do."
-- Liz Smith




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread Colin Howarth
Yes, thanks Mike (and all others). My particular table is only around  
100 x 100 cells, so Drew's code is fine.


Regarding the efficiency points raised in these posts, perhaps   
Stephen's pointer
libcsv is a potential option.   http://sourceforge.net/projects/ 
libcsv/


is a good idea. CSV isn't *that* hard to parse, once you know about  
quotes and NLs inside cells. Maybe a Cocoa wrapper enabling one to  
*use* libcsv in a Cocoa-ish way would be a good idea. You don't really  
need to create an instance with


[MyLetter letterWithLetter: (chr) aLetter encodedUsingEncoding:  
(NSStringEncoding *) someEncoding possiblyUsingSomeFont: (NSFont *)  
aFont]


or

[MyLetter createALetterByReadingFromString:(NSString *)  
theStringImLookingAt whichMightBeEncodedUsingEncoding:(NSEncoding *)  
anEncoding  
butIfItIsntEncodedOrIfICantFigureOutTheEncodingLookAtThisError:  
(NSError ***) error andAssignANiceFont: (NSFont *)font  
andOfCourseASize: (NSPointSize *) makeAPointSizeNumberUsingANumber 
( pointSize)]


for each letter in the file and then

[parser lookAtTheLetterIJustRead: (MyLetter *) theLetter  
andDecideIfItBelongsToTheMutableClassOfSeparatorCharacters:  
(NSMutableArray *) ...


...

:-)   (sorry)

I mean, one doesn't *have to* implement the parser in Objective-C,  
does one?


Maybe I'll have a go (I need the practice) when I've Finished (TM) my  
current project...



-- colin


On 2 Oct, 2009, at 15:49, Adam R. Maxwell wrote:


On Oct 2, 2009, at 6:10 AM, I. Savant wrote:


On Oct 2, 2009, at 7:42 AM, Mike Abdullah wrote:

While using this code in an experimental project I found the app  
was routinely using 500+ MB of RAM. When measured with Instruments  
I realised that every time you use a character set for string  
scanning, Foundation internally copies it, presumably to ensure it  
has an immutable object to work with. As a result, potentially  
thousands of copies are being created, resulting in either:


A) Outrunning the garbage collector
B) Spending far more time allocating and deallocating character  
sets than doing the actual scanning


Easiest solution is just to make a single copy yourself early on.  
Also the docs for NSMutableCharacterSet do point out that it's  
inefficient but don't really offer any detail.



Very good to know, thanks, Mike. You might want to post a comment  
on the macresearch.org page so Drew can adjust the example.


I've also run into major performance problems with -[NSScanner  
scanCharactersFromSet:], which creates an inverted, autoreleased  
character set each time it's called.  This blows up autorelease  
pools (and inverting character sets isn't fast either).  Filed as rdar://problem/4652388.


My workaround for this is generally to invert the set myself and use  
scanUpToCharactersFromSet:, but that could be hurting this example  
as well.


Do you happen to remember what percentage of difference you were  
able to achieve? It's fine if not - it's not too difficult to set  
up a test. :-)


In my case I was running out of address space and crashing, which is  
pretty easy to detect :).



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/colin%40howarth.de

This email sent to co...@howarth.de


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread Kyle Sluder
On Fri, Oct 2, 2009 at 11:27 AM, DKJ  wrote:
> I've just been using NSXMLParser for the first time. Can Excel files be
> saved in XML format? If so, would NSXMLParser be a possible solution here?

XML isn't a format, per se.  It's a structured markup language.  The
actual layout of the data (the schema) is undefined.  I might do this:


  123


While someone else might do this:


  
123
  


Both of these are XML, but in completely different schemata.  So "XML
format" is a bit nonsensical.

The native file format of Excel 2007 and 2008 is indeed an XML-based
format, and it's even a standard.  It's also so terribly complex that
there is as of yet no 100%-compliant editor of this format.

--Kyle Sluder
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread DKJ
I've just been using NSXMLParser for the first time. Can Excel files  
be saved in XML format? If so, would NSXMLParser be a possible  
solution here?


dkj
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread Adam R. Maxwell


On Oct 2, 2009, at 6:10 AM, I. Savant wrote:


On Oct 2, 2009, at 7:42 AM, Mike Abdullah wrote:

While using this code in an experimental project I found the app  
was routinely using 500+ MB of RAM. When measured with Instruments  
I realised that every time you use a character set for string  
scanning, Foundation internally copies it, presumably to ensure it  
has an immutable object to work with. As a result, potentially  
thousands of copies are being created, resulting in either:


A) Outrunning the garbage collector
B) Spending far more time allocating and deallocating character  
sets than doing the actual scanning


Easiest solution is just to make a single copy yourself early on.  
Also the docs for NSMutableCharacterSet do point out that it's  
inefficient but don't really offer any detail.



 Very good to know, thanks, Mike. You might want to post a comment  
on the macresearch.org page so Drew can adjust the example.


I've also run into major performance problems with -[NSScanner  
scanCharactersFromSet:], which creates an inverted, autoreleased  
character set each time it's called.  This blows up autorelease pools  
(and inverting character sets isn't fast either).  Filed as rdar://problem/4652388.


My workaround for this is generally to invert the set myself and use  
scanUpToCharactersFromSet:, but that could be hurting this example as  
well.


Do you happen to remember what percentage of difference you were  
able to achieve? It's fine if not - it's not too difficult to set up  
a test. :-)


In my case I was running out of address space and crashing, which is  
pretty easy to detect :).





smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread I. Savant

On Oct 2, 2009, at 7:42 AM, Mike Abdullah wrote:

While using this code in an experimental project I found the app was  
routinely using 500+ MB of RAM. When measured with Instruments I  
realised that every time you use a character set for string  
scanning, Foundation internally copies it, presumably to ensure it  
has an immutable object to work with. As a result, potentially  
thousands of copies are being created, resulting in either:


A) Outrunning the garbage collector
B) Spending far more time allocating and deallocating character sets  
than doing the actual scanning


Easiest solution is just to make a single copy yourself early on.  
Also the docs for NSMutableCharacterSet do point out that it's  
inefficient but don't really offer any detail.



  Very good to know, thanks, Mike. You might want to post a comment  
on the macresearch.org page so Drew can adjust the example.


  Do you happen to remember what percentage of difference you were  
able to achieve? It's fine if not - it's not too difficult to set up a  
test. :-)


--
I.S.


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread Stephen Hoffman
Before I go through the 550,000 hits (some of them quite old) dare I  
ask if there's one Right Way (TM) to parse this sort of data?


I'm not aware of one Right Way (™) here.

If you're stuck due to somebody else's decision to use CSV, libcsv is  
a potential option.


http://sourceforge.net/projects/libcsv/

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread Mike Abdullah
While using this code in an experimental project I found the app was  
routinely using 500+ MB of RAM. When measured with Instruments I  
realised that every time you use a character set for string scanning,  
Foundation internally copies it, presumably to ensure it has an  
immutable object to work with. As a result, potentially thousands of  
copies are being created, resulting in either:


A) Outrunning the garbage collector
B) Spending far more time allocating and deallocating character sets  
than doing the actual scanning


Easiest solution is just to make a single copy yourself early on. Also  
the docs for NSMutableCharacterSet do point out that it's inefficient  
but don't really offer any detail.


Mike.

On 2 Oct 2009, at 12:02, I. Savant wrote:


On Oct 2, 2009, at 6:34 AM, Mike Abdullah wrote:


inefficient due to its use of NSMutableCharacterSet.


 Could you expand on this? Once created and manipulated, what makes  
it slow for string scanning compared to NSCharacterSet? I hadn't  
heard this.


--
I.S.






___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread I. Savant

On Oct 2, 2009, at 6:34 AM, Mike Abdullah wrote:


 inefficient due to its use of NSMutableCharacterSet.


  Could you expand on this? Once created and manipulated, what makes  
it slow for string scanning compared to NSCharacterSet? I hadn't heard  
this.


--
I.S.




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-02 Thread Mike Abdullah


On 2 Oct 2009, at 02:37, Colin Howarth wrote:


Thanks everyone!

The idiot savant was first to point out the http://macresearch.org/cocoa-scientists-part-xxvi-parsing-csv-data 
 link. And that one refers also to the cocoadev site.


As a warning, that particular bit of code is pretty inefficient due to  
its use of NSMutableCharacterSet. You will find that with any large- 
ish data set it starts allocating vast quantities of memory. But not  
to worry, it's dead easy to fix; just make sure you -copy to create an  
NSCharacterSet before scanning and it will be more efficient.


Mike.

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-01 Thread Colin Howarth

Thanks everyone!

The idiot savant was first to point out the http://macresearch.org/cocoa-scientists-part-xxvi-parsing-csv-data 
 link. And that one refers also to the cocoadev site.


It seems everyone expects there to be a standard Cocoa solution, is  
surprised that there isn't, and develops their own, or goes stumbling  
around the web to see what others have come up with. :-)



--colin


On 2 Oct, 2009, at 03:13, Greg Guerin wrote:



http://www.cocoadev.com/index.pl?ReadWriteCSVAndTSV

That site has enough good stuff, often with links to other stuff,  
that it's worth using a site:cocoadev.com qualifier in your  
googling.  Or just start there.


 -- GG


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/colin%40howarth.de

This email sent to co...@howarth.de


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-01 Thread Greg Guerin


http://www.cocoadev.com/index.pl?ReadWriteCSVAndTSV

That site has enough good stuff, often with links to other stuff,  
that it's worth using a site:cocoadev.com qualifier in your  
googling.  Or just start there.


  -- GG


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-01 Thread Rick Genter


On Oct 1, 2009, at 5:22 PM, Jens Alfke wrote:



On Oct 1, 2009, at 4:09 PM, Colin Howarth wrote:

Before I go through the 550,000 hits (some of them quite old) dare  
I ask if there's one Right Way (TM) to parse this sort of data?


If the data's not too huge, you can read the file into an NSString,  
break that into lines (there are some NSString methods for this, but  
I don't remember their names), and then on each line call [line  
componentsSeparatedByString: @","] to get the values in an NSArray.  
The values will be strings not numbers, but you can call -intValue  
or -doubleValue on each one to convert it.


As has been pointed out, .csv files are not quite that simple.

The .csv format is actually an official RFC (RFC-4180). .csv is not  
very difficult to parse, but there are some subtleties around the  
quoting and newlines:


http://www.rfc-editor.org/rfc/rfc4180.txt

I don't know whether Excel strictly adheres to the RFC, but a .csv  
parser that I wrote a number of years ago that follows the RFC has so  
far been quite successful parsing Excel-generated .csvs without  
complaint.

--
Rick Genter
rick.gen...@gmail.com





___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-01 Thread BareFeet

On 02/10/2009, at 10:22 AM, Jens Alfke wrote:

If the data's not too huge, you can read the file into an NSString,  
break that into lines (there are some NSString methods for this, but  
I don't remember their names), and then on each line call [line  
componentsSeparatedByString: @","] to get the values in an NSArray.


That won't work. CSV can contain commas and newlines within items if  
encapsulated in quotes.


The values will be strings not numbers, but you can call -intValue  
or -doubleValue on each one to convert it.


Yes, CSV doesn't specify types, just strings, so any interpretation  
you make on types is after the actual CSV import.


I found this method (the one under the heading "General CSV"), which  
works fine, caters for delimiters within quotes, escaped quotes etc:


http://macresearch.org/cocoa-scientists-part-xxvi-parsing-csv-data

Tom
BareFeet

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: reading (parsing) CSV (or Excel) data

2009-10-01 Thread Jens Alfke


On Oct 1, 2009, at 4:09 PM, Colin Howarth wrote:

Before I go through the 550,000 hits (some of them quite old) dare I  
ask if there's one Right Way (TM) to parse this sort of data?


If the data's not too huge, you can read the file into an NSString,  
break that into lines (there are some NSString methods for this, but I  
don't remember their names), and then on each line call [line  
componentsSeparatedByString: @","] to get the values in an NSArray.  
The values will be strings not numbers, but you can call -intValue or - 
doubleValue on each one to convert it.


If you have huge amounts of data, like hundreds of thousands of lines,  
this is going to become noticeably inefficient, so you might need to  
start reading and parsing directly from the file.


—Jens___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


reading (parsing) CSV (or Excel) data

2009-10-01 Thread Colin Howarth

Hi,

I'd like my app ('tis progressing steadily :-) to read some simple  
tabular data. The original file is from Excel, but it's not too much  
bother to convert it to CSV.


Searching the developer docs gets me

"gestaltGraphicsVersion"

which is not good, but googling "cocoa csv" works very well. The first  
couple of thousand hits seem relevant :-)


Before I go through the 550,000 hits (some of them quite old) dare I  
ask if there's one Right Way (TM) to parse this sort of data?


Thanks,

--colin
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com