Re: [R] Pack and Unpack Strings in R

2009-01-09 Thread Henrique Dallazuanna
Try this:


## 1
map - list(A = '00', C = '01', G = '10', T = '11')
myStr - 'GATTA'
paste(map[unlist(strsplit(myStr, NULL))], collapse = )

## 2
cod - 100000
library(gsubfn)
strapply(cod, '[0-9]{2}')
names(map)[match(unlist(strapply(cod, '[0-9]{2}')), map)]

On Fri, Jan 9, 2009 at 1:50 PM, Gundala Viswanath gunda...@gmail.comwrote:

 Dear all,

 Does R has any function/package that can pack
 and unpack string into bit size?

 The reason I want to do this in R is that R
 has much more native statistical function than Perl.

 Yet the data I need to process is so large that it
 required me to compress it into smaller unit - process it - finally
 recover them back again into string with new information.

 In Perl the implementation will look like this:
 I wonder how can this be implemented in R.

 __BEGIN__
 my %charmap = (
A = '00',
C = '01',
G = '10',
T = '11',
 );

 my %digmap = (
'00'= A,
'01'= C,
'10'= G,
'11'= T,
 );

 my $string = 'GATTA';
 $string =~ s/(.)/$charmap{$1}/ge;

 my $compressed = pack 'b*', $string;

 print COMP: $compressed\n;
 printf %d bytes\n, length $compressed;

 my @data;

 # Store the compressed bit into array
 push @data, $compressed;

 # process the array
 foreach my $dat ( @data ) {

   my $decompressed = unpack 'b*', $dat;
   $decompressed =~ s/(..)/$digmap{$1}/ge;

   print $decompressed\n;
   # or do further processing on $dat
 }
 __END__


 - Gundala Viswanath
 Jakarta - Indonesia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pack and Unpack Strings in R

2009-01-09 Thread Martin Morgan

Gundala --

Gundala Viswanath wrote:

Dear all,

Does R has any function/package that can pack
and unpack string into bit size?


All of your questions relate to DNA strings. The R/Bioconductor package 
Biostrings is designed to manipulate such objects. It does not 
necessarily address this particular problem (because in general DNA 
strings contain any of the 16 IUPAC symbols and hence compression 
becomes less compelling, and as you indicate even with compression the 
size of the data means that one might often need to process parts of the 
data at a time), but may provide useful containers and methods that make 
such issues less important.


 source('http://bioconductor.org/biocLite.R')
 biocLite('Biostrings')
 library('Biostrings')

see also the vignettes for the package, available within R or for example at

http://bioconductor.org/packages/release/bioc/html/Biostrings.html

It seems that you have data suitable for representation as a DNAStringSet.

The package is actively developed, and using the 'devel' version of R 
(and hence 'devel' version of Biostrings) might provide additional 
important facilities. If this proves useful then follow-up questions 
should use the Bioconductor mailing lists


http://bioconductor.org/docs/mailList.html

Martin


The reason I want to do this in R is that R
has much more native statistical function than Perl.

Yet the data I need to process is so large that it
required me to compress it into smaller unit - process it - finally
recover them back again into string with new information.

In Perl the implementation will look like this:
I wonder how can this be implemented in R.

__BEGIN__
my %charmap = (
A = '00',
C = '01',
G = '10',
T = '11',
);

my %digmap = (
'00'= A,
'01'= C,
'10'= G,
'11'= T,
);

my $string = 'GATTA';
$string =~ s/(.)/$charmap{$1}/ge;

my $compressed = pack 'b*', $string;

print COMP: $compressed\n;
printf %d bytes\n, length $compressed;

my @data;

# Store the compressed bit into array
push @data, $compressed;

# process the array
foreach my $dat ( @data ) {

   my $decompressed = unpack 'b*', $dat;
   $decompressed =~ s/(..)/$digmap{$1}/ge;

   print $decompressed\n;
   # or do further processing on $dat
}
__END__


- Gundala Viswanath
Jakarta - Indonesia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pack and Unpack Strings in R

2009-01-09 Thread jim holtman
see:

http://www.nabble.com/Compressing-String-in-R-td21160453.html

On Fri, Jan 9, 2009 at 10:50 AM, Gundala Viswanath gunda...@gmail.com wrote:
 Dear all,

 Does R has any function/package that can pack
 and unpack string into bit size?

 The reason I want to do this in R is that R
 has much more native statistical function than Perl.

 Yet the data I need to process is so large that it
 required me to compress it into smaller unit - process it - finally
 recover them back again into string with new information.

 In Perl the implementation will look like this:
 I wonder how can this be implemented in R.

 __BEGIN__
 my %charmap = (
A = '00',
C = '01',
G = '10',
T = '11',
 );

 my %digmap = (
'00'= A,
'01'= C,
'10'= G,
'11'= T,
 );

 my $string = 'GATTA';
 $string =~ s/(.)/$charmap{$1}/ge;

 my $compressed = pack 'b*', $string;

 print COMP: $compressed\n;
 printf %d bytes\n, length $compressed;

 my @data;

 # Store the compressed bit into array
 push @data, $compressed;

 # process the array
 foreach my $dat ( @data ) {

   my $decompressed = unpack 'b*', $dat;
   $decompressed =~ s/(..)/$digmap{$1}/ge;

   print $decompressed\n;
   # or do further processing on $dat
 }
 __END__


 - Gundala Viswanath
 Jakarta - Indonesia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.