Re: [R] merging-binning data

Alaios via R-help Wed, 04 Nov 2015 08:18:13 -0800

Thanks for your comments. Actually only the last group has a single element. 
The first group is always "full" of members and as that it works fine. Some 
constant spacing between the groups would be good as well and thus I will check 
quantiles.
Thanks for the great support and time invested on thisRegardsAlex




     On Wednesday, November 4, 2015 3:34 PM, Boris Steipe 
<boris.ste...@utoronto.ca> wrote:
   

 Whatever approach is "best" to define subsets depends completely on the 
semantics of the data. Your approach (a fixed number of equally spaced breaks) 
is the right one if the absolute ranges of the data is important. It should be 
obvious that either the top or the bottom group could contain only a single 
element, and also that any or all of the intermediate groups could be empty. 

If you want to control the number of elements in your groups, use quantiles 
instead. 

Your application may require to define the breaks in other ways. The code I 
have given you doesn't generalize well, as it depends on the equal spacing of 
breaks. As I mentioned earlier, I would not store the groups at all - but would 
define a function that returns a vector of elements in the group, and in the 
function body I would clearly and explicitly define the conditions for group 
membership (and comment it). That is how you make code for a task like this 
explicit and _maintainable_.


Cheers,
Boris


On Nov 4, 2015, at 9:19 AM, Alaios <ala...@yahoo.com> wrote:

> Thanks everything is solved and I was even able to plot boxplots as needed.
> The only minor is that the max element falls in the last category and is only 
> the single one element. Perhaps this can be from the way my data look like.
> Retgards
> Alex
> 
> 
> 
> On Wednesday, November 4, 2015 3:06 PM, Boris Steipe 
> <boris.ste...@utoronto.ca> wrote:
> 
> 
> The breaks are just the min() and max() in your groups. Something like
> 
>  sprintf("[%5.2f,%5.2f]", min(dBin[groups==2]), max(dBin[groups==2]))
> 
> ... should achieve what you need.
> 
> 
> B.
> 
> 
> 
> On Nov 4, 2015, at 8:45 AM, Alaios <ala...@yahoo.com> wrote:
> 
> > you are right.
> > by labels I mean the "categories", "breaks" that my data fall in.
> > To be part of group 2 for example you have to be in the range of [110,223) 
> > I need to keep those for my plots.
> > 
> > Did I describe it more precisely now?
> > Alex
> > 
> > 
> > 
> > On Wednesday, November 4, 2015 2:09 PM, Boris Steipe 
> > <boris.ste...@utoronto.ca> wrote:
> > 
> > 
> > I don't understand: 
> > - where does the "label" come from? (It's not an element of your data that 
> > I see.)
> > - what do you want to do with this "label" i.e. how does it need to be 
> > associated with the data?
> > 
> > 
> > B.
> > 
> > 
> > 
> > On Nov 4, 2015, at 7:57 AM, Alaios <ala...@yahoo.com> wrote:
> > 
> > > Thanks it works great and gives me group numbers as integers and thus I 
> > > can with which group the elements as needed (which (groups== 2))
> > > 
> > > Question though is how to keep also the labels for each group. For 
> > > example that my first group is the [13,206)
> > > 
> > > Regards
> > > Alex
> > > 
> > > 
> > > 
> > > On Wednesday, November 4, 2015 1:00 PM, Boris Steipe 
> > > <boris.ste...@utoronto.ca> wrote:
> > > 
> > > 
> > > I would transform the original numbers into integers which you can use as 
> > > group labels. The row numbers of the group labels are the indexes of your 
> > > values.
> > > 
> > > Example: assume your input vector is dBin
> > > 
> > > nGroups <- 5  # number of groups
> > > groups <- (dBin - min(dBin)) / (max(dBin) - min(dBin)) # rescale to the 
> > > range [0,1]
> > > groups <- floor(groups * nGroups) + 1  # discretize to nGroups integers
> > > 
> > > Now you can eg. get the indices for group 2
> > > 
> > > groups[groups == 2]
> > > 
> > > Depending on the nature of your input data, it may be better to keep 
> > > these groups in a column adjacent to your values, rather than in a 
> > > separate vector, or even better to just calculate the groups on the fly 
> > > in your downstream analysis with the approach given above in a function, 
> > > rather than storing them at all. These are simple operations that should 
> > > not add perceptibly to execution time.
> > > 
> > > Cheers,
> > > Boris
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On Nov 4, 2015, at 6:40 AM, Alaios via R-help <r-help@r-project.org> 
> > > wrote:
> > > 
> > > > Thanks for the answer. Split does not give me the indexes though but 
> > > > only in which group they fall in. I also need the index of the group. 
> > > > Is the first, the second .. group?Alex
> > > > 
> > > > 
> > > > 
> > > >    On Tuesday, November 3, 2015 5:05 PM, Ista Zahn <istaz...@gmail.com> 
> > > >wrote:
> > > > 
> > > > 
> > > > Probably
> > > > 
> > > > split(binDistance, test).
> > > > 
> > > > Best,
> > > > Ista
> > > > 
> > > > On Tue, Nov 3, 2015 at 10:47 AM, Alaios via R-help 
> > > > <r-help@r-project.org> wrote:
> > > >> Dear all,I am not exactly sure on what is the proper name of what I am 
> > > >> trying to do.
> > > >> I have a vector that looks like
> > > >>  binDistance
> > > >>            [,1]
> > > >>  [1,] 238.95162
> > > >>  [2,] 143.08590
> > > >>  [3,]  88.50923
> > > >>  [4,] 177.67884
> > > >>  [5,] 277.54116
> > > >>  [6,] 342.94689
> > > >>  [7,] 241.60905
> > > >>  [8,] 177.81969
> > > >>  [9,] 211.25559
> > > >> [10,] 279.72702
> > > >> [11,] 381.95738
> > > >> [12,] 483.76363
> > > >> [13,] 480.98841
> > > >> [14,] 369.75241
> > > >> [15,] 267.73650
> > > >> [16,] 138.55959
> > > >> [17,] 137.93181
> > > >> [18,] 184.75200
> > > >> [19,] 254.64359
> > > >> [20,] 328.87785
> > > >> [21,] 273.15577
> > > >> [22,] 252.52830
> > > >> [23,] 252.52830
> > > >> [24,] 252.52830
> > > >> [25,] 262.20084
> > > >> [26,] 314.93064
> > > >> [27,] 366.02996
> > > >> [28,] 442.77467
> > > >> [29,] 521.20323
> > > >> [30,] 465.33071
> > > >> [31,] 366.60582
> > > >> [32,]  13.69540
> > > >> so numbers that start from 13 and go up to maximum 522 (I have also 
> > > >> many other similar sets).I want to put these numbers into 5 categories 
> > > >> and thus I have tried cut
> > > >> 
> > > >> 
> > > >> Browse[2]> 
> > > >> test<-cut(binDistance,seq(min(binDistance)-0.00001,max(binDistance),length.out=scaleLength+1))
> > > >> Browse[2]> test
> > > >>  [1] (217,318]  (115,217]  (13.7,115] (115,217]  (217,318]  (318,420]
> > > >>  [7] (217,318]  (115,217]  (115,217]  (217,318]  (318,420]  (420,521]
> > > >> [13] (420,521]  (318,420]  (217,318]  (115,217]  (115,217]  (115,217]
> > > >> [19] (217,318]  (318,420]  (217,318]  (217,318]  (217,318]  (217,318]
> > > >> [25] (217,318]  (217,318]  (318,420]  (420,521]  (420,521]  (420,521]
> > > >> [31] (318,420]  (13.7,115]
> > > >> Levels: (13.7,115] (115,217] (217,318] (318,420] (420,521]
> > > >> 
> > > >> 
> > > >> I want then for the numbers of my initial vector that fall within the 
> > > >> same "category" lets say the (318,420] to be collected on a vector.I 
> > > >> rephrase it the indexes of my initial vector that have a value between 
> > > >> 318 to 420 to be put in a same vector that I can process then as I 
> > > >> want.
> > > >> How I can do that effectively in R?
> > > >> I would like to thank you for your replyRegardsAlex
> > > >> 
> > > >>        [[alternative HTML version deleted]]
> > > >> 
> > > >> ______________________________________________
> > > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > > >> PLEASE do read the posting guide 
> > > >> http://www.R-project.org/posting-guide.html
> > > >> and provide commented, minimal, self-contained, reproducible code.
> > > > 
> > > > 
> > > >    [[alternative HTML version deleted]]
> > > > 
> > > > ______________________________________________
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide 
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > 
> > > 
> > 
> > 
> 
> 


  
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging-binning data

Reply via email to