On Dec 19, 3:08 pm, Corey Murphy <rails-mailing-l...@andreas-s.net>
wrote:
> Frederick Cheung wrote:
> > Pretend you're split. You see the comma after doe so you split. Then
> > you see the other part of your regex and so you split again, resulting
> > in the empty array
>
> > Fred
>
> Ok, but the pipe character stands for alternation much like an OR so
> once it sees doe and splits, the next part of the csv input it evaluates
> should match to the monetary portion of the regex so it splits around
> it.  Maybe I'm just not getting somethings

Maybe not. :)  Frederick is absolutely right.  Here's what I believe
happens when split is working on your input string.

The split routine starts parsing at the beginning of the string (I
will represent the cursor in these examples with a #.)
#john,doe,"1,200.99",training,12345 => []
At the moment, no results have been yielded, so the resulting array
(after the =>) is still empty.

Split runs until it sees a delimiter.
john#,doe,"1,200.99",training,12345 => ['john']

Split discards the delimiter.
john,#doe,"1,200.99",training,12345 => ['john']

Repeat for the next element ('doe') and delimiter (',').
john,doe,#"1,200.99",training,12345 => ['john', 'doe']

Having just read a delimiter (','), split is now prepared to read the
next value. But it doesn't see a value -- it sees another delimiter
('"1,200.99"', which matches your regex and is therefore considered a
delimiter just like ','). Split therefore writes an empty array
element.
john,doe,#"1,200.99",training,12345 => ['john', 'doe', '']

Split skips over the delimiter -- but since the regexp contains
*capturing parentheses*, split inserts the parenthesized portion of
the match (which happens to match the whole delimiter) into the result
set.
john,doe,"1,200.99"#,training,12345 => ['john', 'doe', '',
'"1,200.99"']

Split sees another delimiter (',') with no intervening value, so it
writes another empty string. This time, the regexp that matches the
delimiter has no parentheses, so split does not put it into the result
set.
john,doe,"1,200.99",#training,12345 => ['john', 'doe', '',
'"1,200.99"', '']

Split reads the last two values, skipping the delimiter in between.
john,doe,"1,200.99",training,12345# => ['john', 'doe', '',
'"1,200.99"', '', 'training', '12345']

Does that make it clearer?  Since you defined the "1,200.99" string as
a delimiter, the fact that it appears in the result set at all is
actually accidental, and due 100% to the parentheses that you put
around it in the regexp.  Remember, the regexp in split defines the
pattern for the *delimiter*, not the *values*.

If you're still confused about what the parentheses do, here's a
minimal example:
'aabbcc'.split(/b+/) => ['aa', 'cc']
'aabbcc'.split(/(b+)/) => ['aa', 'bb', 'cc']


> because I want it to split my
> (comma delimited) input line based on comma or when a value is wrapped
> with double quotes.

I hate to break it to you, but a single regexp alone cannot handle
this kind of parsing, because the meaning of a comma is *state-
dependent* -- that is, a comma either acts as a delimiter or not,
depending on whether an odd or even number of double quotes have been
encountered since the beginning of the string.  And regexps are not
state-dependent, so that kind of logic is outside the capabilities of
a regexp.  You'll need to write some higher-level routines in Ruby --
or, as others have suggested, simply use an off-the-shelf CSV parser.
(I'd choose the latter approach if I were you -- parsers are a pain to
get right!)

Best,
--
Marnen Laibow-Koser
mar...@marnen.org
http://www.marnen.org
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com
To unsubscribe from this group, send email to 
rubyonrails-talk+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to