Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Jan T Kim
On Wed, Mar 16, 2016 at 03:18:27PM -0400, Duncan Murdoch wrote:
> On 16/03/2016 1:40 PM, Jan Kim wrote:
> >Barry: that's an interesting hack.
> >
> >I do feel compelled to make two comments, though, regarding the
> >general issue rather than the scraping idea:
> >
> >(1) If your situation is that that image (.RData file) is the only
> >copy of the data, you'll need to rescue the data from that as soon as
> >possible anyway. Something like
> >
> > load(".RData");
> > write.csv(mydataframe, file = "mydata.csv");
> >
> >should do this trick. It will be slow, but you'll need to do it just
> >once, so you might as well enjoy your coffee while you wait. From that
> >point on, work with the mydata.csv file for getting at the colnames
> >(and anything else as well).
> >
> >(2) If there's any chance / risk that scraping data off images is not
> >a one-off, the time to prevent that from catching on is now. If data is
> >of any value at all, it should be handled in a sane, portable, textual
> >format. For tabular data, csv is normally adequate or at least good
> >enough, but .RData images are never a good idea.
> 
> I agree with the sentiment, but not with the choice of .csv as a
> "sane, portable, textual format".  CSV has no type information
> included, so strings that contain only digits can turn into numbers
> (and get rounded in the process), things that look like
> dates can get converted to different formats, etc.

I entirely agree. In hindsight, I should have stated that the .RData files,
as well as the R code to load and extract stuff from them, should be stored
permanently and documented.

> The .RData format has the disadvantages of being hard to use outside
> R, but at least it is usable in R.

yes -- that's why I thought it's a good idea to use R to pluck out the
valuable data, so (1) they can still be accessed even if the .RData
format changes and (2) they're in their own file, separated from the
(potentially homungous, see my P.S.) amount of other stuff caught up
in the image.

But to reiterate, the .RData file should be secured as well if that's
the only remaining primary / original source of the data.

> I don't know what I'd recommend if I wanted a portable textual
> format.  JSON is close, but it can't handle the full
> range of data that R can handle (e.g. no Inf).  dput() on a
> dataframe is text, but nothing but R can read it.

yes, that's the problem with "JSON", it's a JavaScript but not really
an object notation, as it doesn't store class structure metadata.

So again, the best bet is to secure multiple levels, the .RDdata
image to preserve the R types, the R script to be able to identify
the relevant variable(s), and the text version to avoid depending on
availablility of R / an R version still able to read the image format.

Best regards, Jan


> Duncan Murdoch
> 
> 
> >
> >Best regards, Jan
> >
> >P.S.: I've seen .RData images containing many months worth of interactive
> >work, and multiple variants of data frames in variables with more or less
> >similar names, so the set of strings scraped off these will be rather more
> >bewildering than in Barry's clean example.
> >
> >
> >On Wed, Mar 16, 2016 at 05:17:25PM +, Barry Rowlingson wrote:
> >> You *might* be able to get them from the raw file...
> >>
> >> First, I don't quite know what "colnames" of an .RData file means.
> >> "colnames" are the column names of a matrix (or data frame), so I'll
> >> assume your .RData file contains exactly one data frame and you want
> >> to column names of it.
> >>
> >> So let's create one of those:
> >>
> >>
> >> mydataframe = data.frame(mylongnamehere=runif(3),
> >> anotherlongname=runif(3), z=runif(3), y=runif(3),
> >> aasdkjhasdkjhaskdj=runif(3))
> >> save(mydataframe, file="./test.RData")
> >>
> >> Now I'm going to use some Unix utilities to see if there's any
> >> identifiable strings in the file. .RData files are by default
> >> compressed using `gzip`, so I'll `gunzip` them and pipe it into
> >> `strings`:
> >>
> >> $ gunzip -c test.RData | strings -t d
> >>   0 RDX2
> >>  35 mydataframe
> >> 230 names
> >> 251 mylongnamehere
> >> 273 anotherlongname
> >> 314 aasdkjhasdkjhaskdj
> >> 347 row.names
> >> 389 class
> >> 410 data.frame
> >>
> >>
> >>   - thats found the object name (mydataframe) and most of the column
> >> names except the short ones, which are too short for `strings` to
> >> recognise. But if your names are long enough (4 or more chars, I
> >> think) they'll show up.
> >>
> >>  Of course you'll have to filter them out from all the other string
> >> output, but they should all appear shortly after the word "names",
> >> since the colnames of a data frame are the "names" attribute of the
> >> data.
> >>
> >>  If you don't have a Unix or Mac machine handy you can get these
> >> utilities on Windows via Cygwin but that's another story...
> >>
> >>  Barry
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Mar 16, 2016 at 3:59 PM, Lida Ze

Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Lida Zeighami
Thank you Bert and Frederic.

On Wed, Mar 16, 2016 at 11:52 AM, Bert Gunter 
wrote:

> Is it really a .Rdata file? If so, the answer is no, AFAIK, since
> .Rdata files are serialized (binary) versions of e.g. worksheets that
> can contain many different data objects. "colnames" has no meaning in
> this context.
>
> Corrections welcome if I have it wrong!
>
> Cheers,
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Mar 16, 2016 at 8:59 AM, Lida Zeighami  wrote:
> > Hi,
> > I have a huge .RData file and I need just to get the colnames of it. so
> is
> > there any way to reach the column names without loading or reading the
> > whole file?
> > Since the file is so big and I need to repeat this process several times,
> > so it takes so long to load the file first and then take the colnames!
> >
> > Thanks
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Duncan Murdoch

On 16/03/2016 1:40 PM, Jan Kim wrote:

Barry: that's an interesting hack.

I do feel compelled to make two comments, though, regarding the
general issue rather than the scraping idea:

(1) If your situation is that that image (.RData file) is the only
copy of the data, you'll need to rescue the data from that as soon as
possible anyway. Something like

 load(".RData");
 write.csv(mydataframe, file = "mydata.csv");

should do this trick. It will be slow, but you'll need to do it just
once, so you might as well enjoy your coffee while you wait. From that
point on, work with the mydata.csv file for getting at the colnames
(and anything else as well).

(2) If there's any chance / risk that scraping data off images is not
a one-off, the time to prevent that from catching on is now. If data is
of any value at all, it should be handled in a sane, portable, textual
format. For tabular data, csv is normally adequate or at least good
enough, but .RData images are never a good idea.


I agree with the sentiment, but not with the choice of .csv as a "sane, 
portable, textual format".  CSV has no type information included, so 
strings that contain only digits can turn into numbers (and get rounded 
in the process), things that look like

dates can get converted to different formats, etc.

The .RData format has the disadvantages of being hard to use outside R, 
but at least it is usable in R.


I don't know what I'd recommend if I wanted a portable textual format.  
JSON is close, but it can't handle the full
range of data that R can handle (e.g. no Inf).  dput() on a dataframe is 
text, but nothing but R can read it.


Duncan Murdoch




Best regards, Jan

P.S.: I've seen .RData images containing many months worth of interactive
work, and multiple variants of data frames in variables with more or less
similar names, so the set of strings scraped off these will be rather more
bewildering than in Barry's clean example.


On Wed, Mar 16, 2016 at 05:17:25PM +, Barry Rowlingson wrote:
> You *might* be able to get them from the raw file...
>
> First, I don't quite know what "colnames" of an .RData file means.
> "colnames" are the column names of a matrix (or data frame), so I'll
> assume your .RData file contains exactly one data frame and you want
> to column names of it.
>
> So let's create one of those:
>
>
> mydataframe = data.frame(mylongnamehere=runif(3),
> anotherlongname=runif(3), z=runif(3), y=runif(3),
> aasdkjhasdkjhaskdj=runif(3))
> save(mydataframe, file="./test.RData")
>
> Now I'm going to use some Unix utilities to see if there's any
> identifiable strings in the file. .RData files are by default
> compressed using `gzip`, so I'll `gunzip` them and pipe it into
> `strings`:
>
> $ gunzip -c test.RData | strings -t d
>   0 RDX2
>  35 mydataframe
> 230 names
> 251 mylongnamehere
> 273 anotherlongname
> 314 aasdkjhasdkjhaskdj
> 347 row.names
> 389 class
> 410 data.frame
>
>
>   - thats found the object name (mydataframe) and most of the column
> names except the short ones, which are too short for `strings` to
> recognise. But if your names are long enough (4 or more chars, I
> think) they'll show up.
>
>  Of course you'll have to filter them out from all the other string
> output, but they should all appear shortly after the word "names",
> since the colnames of a data frame are the "names" attribute of the
> data.
>
>  If you don't have a Unix or Mac machine handy you can get these
> utilities on Windows via Cygwin but that's another story...
>
>  Barry
>
>
>
>
>
>
>
>
> On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami  wrote:
> > Hi,
> > I have a huge .RData file and I need just to get the colnames of it. so is
> > there any way to reach the column names without loading or reading the
> > whole file?
> > Since the file is so big and I need to repeat this process several times,
> > so it takes so long to load the file first and then take the colnames!
> >
> > Thanks
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Frederic Ntirenganya
I am not sure whether it is possible to get a column name from a dataset
without reading the data.


Checked
by Avast Antivirus. www.avast.com

<#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Wed, Mar 16, 2016 at 6:59 PM, Lida Zeighami  wrote:

> Hi,
> I have a huge .RData file and I need just to get the colnames of it. so is
> there any way to reach the column names without loading or reading the
> whole file?
> Since the file is so big and I need to repeat this process several times,
> so it takes so long to load the file first and then take the colnames!
>
> Thanks
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Bert Gunter
Is it really a .Rdata file? If so, the answer is no, AFAIK, since
.Rdata files are serialized (binary) versions of e.g. worksheets that
can contain many different data objects. "colnames" has no meaning in
this context.

Corrections welcome if I have it wrong!

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Mar 16, 2016 at 8:59 AM, Lida Zeighami  wrote:
> Hi,
> I have a huge .RData file and I need just to get the colnames of it. so is
> there any way to reach the column names without loading or reading the
> whole file?
> Since the file is so big and I need to repeat this process several times,
> so it takes so long to load the file first and then take the colnames!
>
> Thanks
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Jan Kim
Barry: that's an interesting hack.

I do feel compelled to make two comments, though, regarding the
general issue rather than the scraping idea:

(1) If your situation is that that image (.RData file) is the only
copy of the data, you'll need to rescue the data from that as soon as
possible anyway. Something like

load(".RData");
write.csv(mydataframe, file = "mydata.csv");

should do this trick. It will be slow, but you'll need to do it just
once, so you might as well enjoy your coffee while you wait. From that
point on, work with the mydata.csv file for getting at the colnames
(and anything else as well).

(2) If there's any chance / risk that scraping data off images is not
a one-off, the time to prevent that from catching on is now. If data is
of any value at all, it should be handled in a sane, portable, textual
format. For tabular data, csv is normally adequate or at least good
enough, but .RData images are never a good idea.

Best regards, Jan

P.S.: I've seen .RData images containing many months worth of interactive
work, and multiple variants of data frames in variables with more or less
similar names, so the set of strings scraped off these will be rather more
bewildering than in Barry's clean example.


On Wed, Mar 16, 2016 at 05:17:25PM +, Barry Rowlingson wrote:
> You *might* be able to get them from the raw file...
> 
> First, I don't quite know what "colnames" of an .RData file means.
> "colnames" are the column names of a matrix (or data frame), so I'll
> assume your .RData file contains exactly one data frame and you want
> to column names of it.
> 
> So let's create one of those:
> 
> 
> mydataframe = data.frame(mylongnamehere=runif(3),
> anotherlongname=runif(3), z=runif(3), y=runif(3),
> aasdkjhasdkjhaskdj=runif(3))
> save(mydataframe, file="./test.RData")
> 
> Now I'm going to use some Unix utilities to see if there's any
> identifiable strings in the file. .RData files are by default
> compressed using `gzip`, so I'll `gunzip` them and pipe it into
> `strings`:
> 
> $ gunzip -c test.RData | strings -t d
>   0 RDX2
>  35 mydataframe
> 230 names
> 251 mylongnamehere
> 273 anotherlongname
> 314 aasdkjhasdkjhaskdj
> 347 row.names
> 389 class
> 410 data.frame
> 
> 
>   - thats found the object name (mydataframe) and most of the column
> names except the short ones, which are too short for `strings` to
> recognise. But if your names are long enough (4 or more chars, I
> think) they'll show up.
> 
>  Of course you'll have to filter them out from all the other string
> output, but they should all appear shortly after the word "names",
> since the colnames of a data frame are the "names" attribute of the
> data.
> 
>  If you don't have a Unix or Mac machine handy you can get these
> utilities on Windows via Cygwin but that's another story...
> 
>  Barry
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami  wrote:
> > Hi,
> > I have a huge .RData file and I need just to get the colnames of it. so is
> > there any way to reach the column names without loading or reading the
> > whole file?
> > Since the file is so big and I need to repeat this process several times,
> > so it takes so long to load the file first and then take the colnames!
> >
> > Thanks
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
 +- Jan T. Kim ---+
 | email: jtt...@gmail.com|
 | WWW:   http://www.jtkim.dreamhosters.com/  |
 *-=<  hierarchical systems are for files, not for humans  >=-*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Loris Bennett
Jan T Kim  writes:

> On Wed, Mar 16, 2016 at 03:18:27PM -0400, Duncan Murdoch wrote:
>> On 16/03/2016 1:40 PM, Jan Kim wrote:
>> >Barry: that's an interesting hack.
>> >
>> >I do feel compelled to make two comments, though, regarding the
>> >general issue rather than the scraping idea:
>> >
>> >(1) If your situation is that that image (.RData file) is the only
>> >copy of the data, you'll need to rescue the data from that as soon as
>> >possible anyway. Something like
>> >
>> > load(".RData");
>> > write.csv(mydataframe, file = "mydata.csv");
>> >
>> >should do this trick. It will be slow, but you'll need to do it just
>> >once, so you might as well enjoy your coffee while you wait. From that
>> >point on, work with the mydata.csv file for getting at the colnames
>> >(and anything else as well).
>> >
>> >(2) If there's any chance / risk that scraping data off images is not
>> >a one-off, the time to prevent that from catching on is now. If data is
>> >of any value at all, it should be handled in a sane, portable, textual
>> >format. For tabular data, csv is normally adequate or at least good
>> >enough, but .RData images are never a good idea.
>> 
>> I agree with the sentiment, but not with the choice of .csv as a
>> "sane, portable, textual format".  CSV has no type information
>> included, so strings that contain only digits can turn into numbers
>> (and get rounded in the process), things that look like
>> dates can get converted to different formats, etc.
>
> I entirely agree. In hindsight, I should have stated that the .RData files,
> as well as the R code to load and extract stuff from them, should be stored
> permanently and documented.
>
>> The .RData format has the disadvantages of being hard to use outside
>> R, but at least it is usable in R.
>
> yes -- that's why I thought it's a good idea to use R to pluck out the
> valuable data, so (1) they can still be accessed even if the .RData
> format changes and (2) they're in their own file, separated from the
> (potentially homungous, see my P.S.) amount of other stuff caught up
> in the image.
>
> But to reiterate, the .RData file should be secured as well if that's
> the only remaining primary / original source of the data.
>
>> I don't know what I'd recommend if I wanted a portable textual
>> format.  JSON is close, but it can't handle the full
>> range of data that R can handle (e.g. no Inf).  dput() on a
>> dataframe is text, but nothing but R can read it.
>
> yes, that's the problem with "JSON", it's a JavaScript but not really
> an object notation, as it doesn't store class structure metadata.
>
> So again, the best bet is to secure multiple levels, the .RDdata
> image to preserve the R types, the R script to be able to identify
> the relevant variable(s), and the text version to avoid depending on
> availablility of R / an R version still able to read the image format.
>
> Best regards, Jan

The package 'h5' provides an R interface to HDF5 files.  I have used
neither, but am aware that HDF5 is a widely used format for storing
complex data structures.  Would that be useful?

Cheers,

Loris

[snip (99 lines)]
-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Richard M. Heiberger
Barry's solution works with Windows without cygwin.
You do need Rtools, available from the Windows page on CRAN

Rtools does not have "gunzip", but that is just an abbreviation for "gzip -d".

x:\HOME\rmh\HH-R.package>path
path
PATH=c:\Progra~2\Rtools\bin;c:\Progra~2\Rtools\gcc-4.6.3\bin;c:\progra~1\R\R-3.2.3\bin\x64;c:\Progra~1\MikTeX~1.9\miktex\bin\x64;c:\windows;c:\windows\system32

x:\HOME\rmh\HH-R.package>gzip -d -c
c:\Users\rmh.DESKTOP-60G4CCO\test.RData | strings -t d
gzip -d -c c:\Users\rmh.DESKTOP-60G4CCO\test.RData | strings -t d
  0 RDX2
 35 mydataframe
230 names
251 mylongnamehere
273 anotherlongname
314 aasdkjhasdkjhaskdj
347 row.names
389 class
410 data.frame

On Wed, Mar 16, 2016 at 1:17 PM, Barry Rowlingson
 wrote:
> You *might* be able to get them from the raw file...
>
> First, I don't quite know what "colnames" of an .RData file means.
> "colnames" are the column names of a matrix (or data frame), so I'll
> assume your .RData file contains exactly one data frame and you want
> to column names of it.
>
> So let's create one of those:
>
>
> mydataframe = data.frame(mylongnamehere=runif(3),
> anotherlongname=runif(3), z=runif(3), y=runif(3),
> aasdkjhasdkjhaskdj=runif(3))
> save(mydataframe, file="./test.RData")
>
> Now I'm going to use some Unix utilities to see if there's any
> identifiable strings in the file. .RData files are by default
> compressed using `gzip`, so I'll `gunzip` them and pipe it into
> `strings`:
>
> $ gunzip -c test.RData | strings -t d
>   0 RDX2
>  35 mydataframe
> 230 names
> 251 mylongnamehere
> 273 anotherlongname
> 314 aasdkjhasdkjhaskdj
> 347 row.names
> 389 class
> 410 data.frame
>
>
>   - thats found the object name (mydataframe) and most of the column
> names except the short ones, which are too short for `strings` to
> recognise. But if your names are long enough (4 or more chars, I
> think) they'll show up.
>
>  Of course you'll have to filter them out from all the other string
> output, but they should all appear shortly after the word "names",
> since the colnames of a data frame are the "names" attribute of the
> data.
>
>  If you don't have a Unix or Mac machine handy you can get these
> utilities on Windows via Cygwin but that's another story...
>
>  Barry
>
>
>
>
>
>
>
>
> On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami  wrote:
>> Hi,
>> I have a huge .RData file and I need just to get the colnames of it. so is
>> there any way to reach the column names without loading or reading the
>> whole file?
>> Since the file is so big and I need to repeat this process several times,
>> so it takes so long to load the file first and then take the colnames!
>>
>> Thanks
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to reach the column names in a huge .RData file without loading it

2016-03-19 Thread Lida Zeighami
Hi,
I have a huge .RData file and I need just to get the colnames of it. so is
there any way to reach the column names without loading or reading the
whole file?
Since the file is so big and I need to repeat this process several times,
so it takes so long to load the file first and then take the colnames!

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-18 Thread Barry Rowlingson
You *might* be able to get them from the raw file...

First, I don't quite know what "colnames" of an .RData file means.
"colnames" are the column names of a matrix (or data frame), so I'll
assume your .RData file contains exactly one data frame and you want
to column names of it.

So let's create one of those:


mydataframe = data.frame(mylongnamehere=runif(3),
anotherlongname=runif(3), z=runif(3), y=runif(3),
aasdkjhasdkjhaskdj=runif(3))
save(mydataframe, file="./test.RData")

Now I'm going to use some Unix utilities to see if there's any
identifiable strings in the file. .RData files are by default
compressed using `gzip`, so I'll `gunzip` them and pipe it into
`strings`:

$ gunzip -c test.RData | strings -t d
  0 RDX2
 35 mydataframe
230 names
251 mylongnamehere
273 anotherlongname
314 aasdkjhasdkjhaskdj
347 row.names
389 class
410 data.frame


  - thats found the object name (mydataframe) and most of the column
names except the short ones, which are too short for `strings` to
recognise. But if your names are long enough (4 or more chars, I
think) they'll show up.

 Of course you'll have to filter them out from all the other string
output, but they should all appear shortly after the word "names",
since the colnames of a data frame are the "names" attribute of the
data.

 If you don't have a Unix or Mac machine handy you can get these
utilities on Windows via Cygwin but that's another story...

 Barry








On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami  wrote:
> Hi,
> I have a huge .RData file and I need just to get the colnames of it. so is
> there any way to reach the column names without loading or reading the
> whole file?
> Since the file is so big and I need to repeat this process several times,
> so it takes so long to load the file first and then take the colnames!
>
> Thanks
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reach the column names in a huge .RData file without loading it

2016-03-18 Thread Boris Steipe
However: if you need to repeat the process, as you wrote, you could store the 
column names in a separate object for future access after your first read.

B.


On Mar 16, 2016, at 12:59 PM, Lida Zeighami  wrote:

> Thank you Bert and Frederic.
> 
> On Wed, Mar 16, 2016 at 11:52 AM, Bert Gunter 
> wrote:
> 
>> Is it really a .Rdata file? If so, the answer is no, AFAIK, since
>> .Rdata files are serialized (binary) versions of e.g. worksheets that
>> can contain many different data objects. "colnames" has no meaning in
>> this context.
>> 
>> Corrections welcome if I have it wrong!
>> 
>> Cheers,
>> Bert
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Wed, Mar 16, 2016 at 8:59 AM, Lida Zeighami  wrote:
>>> Hi,
>>> I have a huge .RData file and I need just to get the colnames of it. so
>> is
>>> there any way to reach the column names without loading or reading the
>>> whole file?
>>> Since the file is so big and I need to repeat this process several times,
>>> so it takes so long to load the file first and then take the colnames!
>>> 
>>> Thanks
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.