On Mon, Feb 18, 2013 at 4:50 PM, Joel Pearson <[email protected]> wrote:
> I went with "filter" with an optional true/false regex switch because it
> seemed like the simplest way to use it, and closest to my own experience
> in using Excel's filters.
> Passing the symbol feels less intuitive, and yielding to a block means
> writing more code, particularly when I'm writing a quick method chain.
> The notation I set up feels natural to me when chaining criteria. For
> example I can just do this:
> data.filter( 'Account', /^P/ ).filter( 'Type', /^Large/, false )
What do you mean by "writing more code"? It is as short as
data.filter( 'Account', /^P/ ).filter( 'Type' ) {|x| /^Large/ =~ x}
You could even implement Regexp#to_proc like this
class Regexp
def to_proc
lambda {|s| self =~ s}
end
end
and then do
data.filter( 'Account', /^P/ ).filter( 'Type', &/^Large/)
> Regarding the usage of skip_headers
> Say I have this data:
>
> Type Flag Unique_ID
> Type1 1 A001
> Type2 0 A002
> Type1 0 A003
> Type3 1 A004
> Type1 1 A005
>
> If I only want to keep Parts of "Type1" and "Type3" then I could use
> "select" and some Regex, but I might pick up the Header as well if I'm
> not careful.
> Using a method like "skip_headers" allows me to select or reject
> elements of the data without losing the identifiers in the first row,
> which I'm almost always going to need at the end when I output the data
> into human-readable format.
But wouldn't you want to make the decision what is a header and what
not more flexible? Possible criteria that come to mind are
- first n lines / columns
- first lines / columns where all values match a particular regexp
- any line or column where all values match a particular regexp
> I'm also dealing with entire rows rather than individual cells, and
> since the source data can change its content and order, using the
> headers to identify the data source for a given operation is essential.
> Using skip_headers both allows me to preserve them while sorting through
> data, and also puts them back on again for the next time I need to
> reference them.
So basically you want a view on the data which omits a few rows and
columns. Given that there are so many potential criteria I'd probably
pass in one object implementing === as a row header detector and one
as a column header detector. Since Proc implements === as call you
can also easily provide a lambda there. The argument to === would be
the Column respective Row instance so the position as well as the cell
contents can be evaluated to decide whether something constitutes a
header row / column. Once you have that in place you could create
convenience methods using one of the criteria mentioned above. Just a
few thoughts.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
--
[email protected] |
https://groups.google.com/d/forum/ruby-talk-google?hl=en
---
You received this message because you are subscribed to the Google Groups
"ruby-talk-google" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.