On Mon, Feb 18, 2013 at 4:50 PM, Joel Pearson <[email protected]> wrote:
> I went with "filter" with an optional true/false regex switch because it
> seemed like the simplest way to use it, and closest to my own experience
> in using Excel's filters.
> Passing the symbol feels less intuitive, and yielding to a block means
> writing more code, particularly when I'm writing a quick method chain.
> The notation I set up feels natural to me when chaining criteria. For
> example I can just do this:
> data.filter( 'Account', /^P/ ).filter( 'Type', /^Large/, false )

What do you mean by "writing more code"?  It is as short as

data.filter( 'Account', /^P/ ).filter( 'Type' ) {|x| /^Large/ =~ x}

You could even implement Regexp#to_proc like this

class Regexp
  def to_proc
    lambda {|s| self =~ s}
  end
end

and then do

data.filter( 'Account', /^P/ ).filter( 'Type', &/^Large/)

> Regarding the usage of skip_headers
> Say I have this data:
>
> Type  Flag  Unique_ID
> Type1  1  A001
> Type2  0  A002
> Type1  0  A003
> Type3  1  A004
> Type1  1  A005
>
> If I only want to keep Parts of "Type1" and "Type3" then I could use
> "select" and some Regex, but I might pick up the Header as well if I'm
> not careful.
> Using a method like "skip_headers" allows me to select or reject
> elements of the data without losing the identifiers in the first row,
> which I'm almost always going to need at the end when I output the data
> into human-readable format.

But wouldn't you want to make the decision what is a header and what
not more flexible?  Possible criteria that come to mind are
 - first n lines / columns
 - first lines / columns where all values match a particular regexp
 - any line or column where all values match a particular regexp

> I'm also dealing with entire rows rather than individual cells, and
> since the source data can change its content and order, using the
> headers to identify the data source for a given operation is essential.
> Using skip_headers both allows me to preserve them while sorting through
> data, and also puts them back on again for the next time I need to
> reference them.

So basically you want a view on the data which omits a few rows and
columns.  Given that there are so many potential criteria I'd probably
pass in one object implementing === as a row header detector and one
as a column header detector.  Since Proc implements === as call you
can also easily provide a lambda there.  The argument to === would be
the Column respective Row instance so the position as well as the cell
contents can be evaluated to decide whether something constitutes a
header row / column.  Once you have that in place you could create
convenience methods using one of the criteria mentioned above.  Just a
few thoughts.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
[email protected] | 
https://groups.google.com/d/forum/ruby-talk-google?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"ruby-talk-google" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to