Joel Neilson wrote:
> No problem - I apologize for the lack of clarity.
>
> >>> import rpy2.robjects as robjects
> >>> r = robjects.r
> >>> wilcox = robjects.r['wilcox.test']
> >>> vec1 = [1,2,3,4,5]
> >>> vec2 = [4,5,6,7,8]
> >>> rvec1 = robjects.FloatVector(vec1)
> >>> rvec2 = robjects.FloatVector(vec2)
> >>> address = wilcox(rvec1, rvec2)
> Warning message:
> In wilcox.test.default(c(1, 2, 3, 4, 5), c(4, 5, 6, 7, 8)) :
> cannot compute exact p-value with ties
> >>> address
> <RVector - Python:0x6c9e18 / R:0xda4608>
>
> >>> print address
>
> Wilcoxon rank sum test with continuity correction
>
> data: c(1, 2, 3, 4, 5) and c(4, 5, 6, 7, 8) #herein likely
> lies the problem if it's big
Yes. This is happening because:
- the R print method for objects of R class 'htest' likes to tell it all
about the data used
- on an R standpoint the python variables 'rvec1' and 'rvec2' are
anonymous (that is data structures without any associated name/symbol).
> W = 2, p-value = 0.03558
> alternative hypothesis: true location shift is not equal to 0
>
> #right here is the problem that I ran into. If I convert address to a
> string and split it to get out the p value,
That's not the most efficient way to proceed; you are converting a
(mostly) numerical data structure into a string in order to parse it and
extract one numerical value of interest. It is better to extract
directly your value of interest.
Try instead:
>>> test_res = wilcox(rvec1, rvec2)
>>> print(test_res.names)
[1] "statistic" "parameter" "p.value" "null.value" "alternative"
[6] "method" "data.name"
>>> test_res.subset('p.value')[0][0]
0.035578833239594126
> #funny things start happening once the aggregate vector length (e.g.
> both of them) is about 1500
> #i think it is because R returns the primary data as illustrated above,
> and once that data line gets out to 1500 or
> #so converting the address to a string returns only one of the following
> four lines, and if it's the fourth line, that gets
> #truncated
There were/are issue with long string (as you found it).
> #but inside the R documentation for the wilcox test (below), i found
> that besides the above output, which i am
> #used to seeing, R is storing the following values as a list:
>
> 1. statistic
> the value of the test statistic with a name describing it.
> 2. parameter
> the parameter(s) for the exact distribution of the test statistic.
> 3. p.value
> the p-value for the test.
> 4. null.value
> the location parameter mu.
> 5. alternative
> a character string describing the alternative hypothesis.
> 6. method
> the type of test applied.
> 7. data.name
> a character string giving the names of the data.
> 8. conf.int
> a confidence interval for the location parameter. (Only present if
> argument conf.int = TRUE.)
> 9. estimate
> an estimate of the location parameter. (Only present if argument
> conf.int = TRUE.)
>
> #so directly extracting what you need from the stored variable seems to
> do the trick:
>
> >>> pval = str(address[2])
> >>> pval
> '[1] 0.03557883'
> >>> pvalactual = float(pval[4:])
> >>> pvalactual
> 0.035578829999999999
>
> #totally easy in hindsight, which is the way i guess most things are
> #but i hope this is helpful to other rookies who run into the problem
Same here: it is not necessary to convert a numerical value into its
string representation when you are primarily after the numerical value.
It is slower (and you are currently loosing precision).
>>> pval = address[2][0]
>>> pval
0.035578833239594126
>
> On Apr 4, 2009, at 5:47 AM, Laurent Gautier wrote:
>
>> Joel,
>>
>> Good that you solved your issue.
>> However, I am not certain of what you mean by "extracting the required
>> object directly from the address rather than first converting the
>> address to a string".
>>
>> Self-contained examples often constitute a very efficient way to
>> demonstrate the problem when requesting help from the list.
>>
>>
>> L.
>>
>>
>>
>>
>>
>> Joel Neilson wrote:
>>> although i still don't understand what's happening and why, this
>>> problem went away if i extracted the required object directly from
>>> the address rather than first converting the address to a string or
>>> list and then indexing out what i wanted.
>>> i'm new to both python and computer science in general, so if this
>>> is obvious to everyone on the list i apologize. however, it seems
>>> that the others have run into analogous problems with long R outputs
>>> (see: '[Rpy] R console: long output' thread) and it was not obvious
>>> to me upon reading these threads precisely where the problem was
>>> occurring. now i know and hopefully this is useful information.
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> rpy-list mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/rpy-list
>>
>
> Joel R. Neilson, Ph.D.
> Research Scientist/Sharp Lab
> Koch Institute for Integrative Cancer Research
> Massachusetts Institute of Technology
> 40 Ames Street, E17-528
> Cambridge, MA 02139
>
> t: 617.253.6457
> f: 617.253.3867
>
> [email protected]
>
>
>
------------------------------------------------------------------------------
_______________________________________________
rpy-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rpy-list