Re: [R] Extracting File Basename without Extension

2009-01-14 Thread William Dunlap
The S+ basename() function has an argument called suffix and
it will remove the suffix from the result.  This was based on
the Unix basename command, but I missed the special case in the
Unix basename that doesn't remove the suffix if the removal
would result in an empty string.  The suffix must include any
initial dot.  Unlike the Unix basename, the suffix is a regular
expression (but other regexpr arguments like ignore.case and
fixed are not acceptedi - there ought to be a regular expression
class so these things get attached to the expression instead of
added to the call). 
  
   basename(c(foobar, dir/foo.bar), suffix=.bar)
  [1] fo  foo
   basename(c(foobar, dir/foo.bar), suffix=\\.bar)
  [1] foobar foo   
 
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-11 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
 On Fri, Jan 9, 2009 at 4:20 PM, Wacek Kusnierczyk

   
 right;  there's a straightforward fix to my solution that accounts for
 cases such as '.bashrc':

 names = c(foo.bar, .zee)
 sub((.+)[.][^.]+$, \\1, names)

 you could also use a lookbehind if possible (not in r, afaik).

 

 or:

   
 sub(.*[.], ., names)
 
 [1] .bar .zee
   

it was foo that was desired...

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-11 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
 On Fri, Jan 9, 2009 at 4:28 PM, Wacek Kusnierczyk
 waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
   
 Rau, Roland wrote:
 
 P.S. Any suggestions how to become more proficient with regular
 expressions? The O'Reilly book (Mastering...)? Whenever I tried
 anything more complicated than basic usage (things like ^ $ * . ) in R,
 I was way faster to write a new function (like above) instead of finding
 a regex solution.

   
 the book you mention is good.
 you may also consider http://www.regular-expressions.info/

 regexes are usually well explained with lots of examples in perl books.

 
 By the way: it might be still possible to *write* regular expressions,
 but what about code re-use? Are there people who can easily *read*
 complicated regular expressions?

   
 in some cases it is possible to write regular expressions in a way that
 facilitates reading them by a human.  in perl, for example, you can use
 so-called readable regexes:

 /
   (.+)# match and remember at least one arbitrary character
   [.] # match a dot
   [^.]+ # match at least one non-dot character
   $  # end of string anchor
 /x;

 you can also use within regex comments:

 /(.+)(?# one or more chars)[.](?# a dot)[^.]+(?# one or more
 non-dots)$(?# end of string)/


 nothing of the sorts in r, however.
 

 Supports that if you begin the regular expression with (?x) and
 use perl = TRUE.  See ?regexp
   

cool, i see ?xism is supported.  so the above can be written in r as:

names = c(foo.bar, .zee)
sub((?x) # alloow embedded comemnts
 (.+) # match and remember at least one arbitrary character
[.] # match a dot
[^.]+ # match at least one non-dot character
$ # end of string anchor,
\\1, names, perl=TRUE)

is this what you wanted, roland?

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Berwin A Turlach
G'day all,

On Fri, 9 Jan 2009 08:12:18 -0200
Henrique Dallazuanna www...@gmail.com wrote:

 Try this also:
 
 substr(basename(myfile), 1, nchar(basename(myfile)) - 4)

Or, in case that the extension has more than three letters or myfile
is a vector of names:

R myfile - path1/path2/myoutput.txt
R sapply(strsplit(basename(myfile),\\.), function(x) 
paste(x[1:(length(x)-1)], collapse=.))
[1] myoutput
R myfile2 - c(myfile, path2/path3/myoutput.temp)
R sapply(strsplit(basename(myfile2),\\.), function(x) 
paste(x[1:(length(x)-1)], collapse=.))
[1] myoutput myoutput
R myfile3 - c(myfile2, path4/path5/my.out.put.xls)
R sapply(strsplit(basename(myfile3),\\.), function(x) 
paste(x[1:(length(x)-1)], collapse=.))
[1] myoutput   myoutput   my.out.put

HTH.

Cheers,

Berwin

 On Fri, Jan 9, 2009 at 12:10 AM, Gundala Viswanath
 gunda...@gmail.comwrote:
 
  Dear all,
 
  The basename() function returns the extension also:
 
   myfile - path1/path2/myoutput.txt
   basename(myfile)
  [1] myoutput.txt
 
 
  Is there any other function where it just returns
  plain base:
 
  myoutput
 
  i.e. without 'txt'
 
  - Gundala Viswanath
  Jakarta - Indonesia

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 G'day all,

 On Fri, 9 Jan 2009 08:12:18 -0200
 Henrique Dallazuanna www...@gmail.com wrote:

   
 Try this also:

 substr(basename(myfile), 1, nchar(basename(myfile)) - 4)
 

 Or, in case that the extension has more than three letters or myfile
 is a vector of names:

 R myfile - path1/path2/myoutput.txt
 R sapply(strsplit(basename(myfile),\\.), function(x) 
 paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput
 R myfile2 - c(myfile, path2/path3/myoutput.temp)
 R sapply(strsplit(basename(myfile2),\\.), function(x) 
 paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput myoutput
 R myfile3 - c(myfile2, path4/path5/my.out.put.xls)
 R sapply(strsplit(basename(myfile3),\\.), function(x) 
 paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput   myoutput   my.out.put

   

or have sub do the job for you:

filenames.ext = c(foo.bar, basename(foo/bar/hello.dolly))
(filenames.noext = sub([.][^.]*$, , filenames.ext, perl=TRUE))



vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Gabor Grothendieck
On Fri, Jan 9, 2009 at 6:52 AM, Wacek Kusnierczyk
waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 Berwin A Turlach wrote:
 G'day all,

 On Fri, 9 Jan 2009 08:12:18 -0200
 Henrique Dallazuanna www...@gmail.com wrote:


 Try this also:

 substr(basename(myfile), 1, nchar(basename(myfile)) - 4)


 Or, in case that the extension has more than three letters or myfile
 is a vector of names:

 R myfile - path1/path2/myoutput.txt
 R sapply(strsplit(basename(myfile),\\.), function(x) 
 paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput
 R myfile2 - c(myfile, path2/path3/myoutput.temp)
 R sapply(strsplit(basename(myfile2),\\.), function(x) 
 paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput myoutput
 R myfile3 - c(myfile2, path4/path5/my.out.put.xls)
 R sapply(strsplit(basename(myfile3),\\.), function(x) 
 paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput   myoutput   my.out.put



 or have sub do the job for you:

 filenames.ext = c(foo.bar, basename(foo/bar/hello.dolly))
 (filenames.noext = sub([.][^.]*$, , filenames.ext, perl=TRUE))

We can omit perl = TRUE here.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
 On Fri, Jan 9, 2009 at 6:52 AM, Wacek Kusnierczyk

   
 or have sub do the job for you:

 filenames.ext = c(foo.bar, basename(foo/bar/hello.dolly))
 (filenames.noext = sub([.][^.]*$, , filenames.ext, perl=TRUE))
 

 We can omit perl = TRUE here.

   


or maybe not, depending on the actual task:

names = replicate(1, paste(sample(c(letters, .), 100,
replace=TRUE), collapse=))
system.time(replicate(10, sub([.][^.]*$, , names, perl=TRUE)))
system.time(replicate(10, sub([.][^.]*$, , names)))

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Berwin A Turlach
G'day Wacek,

On Fri, 09 Jan 2009 12:52:46 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 Berwin A Turlach wrote:
  G'day all,
 
  On Fri, 9 Jan 2009 08:12:18 -0200
  Henrique Dallazuanna www...@gmail.com wrote:
 

  Try this also:
 
  substr(basename(myfile), 1, nchar(basename(myfile)) - 4)
  
 
  Or, in case that the extension has more than three letters or
  myfile is a vector of names:
 
  R myfile - path1/path2/myoutput.txt
  R sapply(strsplit(basename(myfile),\\.), function(x)
  R paste(x[1:(length(x)-1)], collapse=.))
  [1] myoutput
  R myfile2 - c(myfile, path2/path3/myoutput.temp)
  R sapply(strsplit(basename(myfile2),\\.), function(x)
  R paste(x[1:(length(x)-1)], collapse=.))
  [1] myoutput myoutput
  R myfile3 - c(myfile2, path4/path5/my.out.put.xls)
  R sapply(strsplit(basename(myfile3),\\.), function(x)
  R paste(x[1:(length(x)-1)], collapse=.))
  [1] myoutput   myoutput   my.out.put
 

 
 or have sub do the job for you:
 
 filenames.ext = c(foo.bar, basename(foo/bar/hello.dolly))
 (filenames.noext = sub([.][^.]*$, , filenames.ext, perl=TRUE))

Apparently also a possibility, I guess it can be made to work with the
original example and my extensions.

Though, it seems to require the knowledge of perl, or at least perl's
regular expression. 

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 G'day Wacek,


   
 Or, in case that the extension has more than three letters or
 myfile is a vector of names:

 R myfile - path1/path2/myoutput.txt
 R sapply(strsplit(basename(myfile),\\.), function(x)
 R paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput
 R myfile2 - c(myfile, path2/path3/myoutput.temp)
 R sapply(strsplit(basename(myfile2),\\.), function(x)
 R paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput myoutput
 R myfile3 - c(myfile2, path4/path5/my.out.put.xls)
 R sapply(strsplit(basename(myfile3),\\.), function(x)
 R paste(x[1:(length(x)-1)], collapse=.))
 [1] myoutput   myoutput   my.out.put

   
   
 or have sub do the job for you:

 filenames.ext = c(foo.bar, basename(foo/bar/hello.dolly))
 (filenames.noext = sub([.][^.]*$, , filenames.ext, perl=TRUE))
 

   

g'afternoon berwin,

 Apparently also a possibility, I guess it can be made to work with the
 original example and my extensions.
   

i guess it does work with the original example and your extensions.

 Though, it seems to require the knowledge of perl, or at least perl's
 regular expression. 
   

oh my, sorry.  it' so bad to go an inch out of the cosy world of r. 
but, as gabor pointed, 'perl=TRUE' is inessential here, so you actually
need to know just (very basic) regular expressions, with no 'perl'
implied.  having learnt this simple regex syntax you can avoid the need
for looking up strsplit and paste in tfm, so i'd consider it worthwhile.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Berwin A Turlach
G'day Wacek,

On Fri, 09 Jan 2009 14:22:19 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

  Apparently also a possibility, I guess it can be made to work with
  the original example and my extensions.

 
 i guess it does work with the original example and your extensions.

And I thought that you would have known for sure.

  Though, it seems to require the knowledge of perl, or at least
  perl's regular expression.
 
 oh my, sorry.  it' so bad to go an inch out of the cosy world of r. 

Well, if that's how you feel, don't do it.

I regularly use other languages besides R.  Mostly C and Fortran,
occasionally Python.  But I never found time to learn Perl or Java or
awk or C++ or; some people do not have the time to learn all
languages under the sun. Also, if one concentrates on a few, one can
learn them really well.

 but, as gabor pointed, 'perl=TRUE' is inessential here,

I thought that your answer to Gabor indicated that, depending on the
context, perl=TRUE was essential; though I must admit that I did not
run that code.  

 so you actually need to know just (very basic) regular expressions,
 with no 'perl' implied.  having learnt this simple regex syntax you
 can avoid the need for looking up strsplit and paste in tfm, so i'd
 consider it worthwhile.

As people say, YMMV, I do not need to look up strsplit and/or paste;
but I would have to look up what the regular expression syntax or
finally memorise it; something I did not consider worthwhile so far.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 G'day Wacek,

 On Fri, 09 Jan 2009 14:22:19 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 Apparently also a possibility, I guess it can be made to work with
 the original example and my extensions.
   
   
 i guess it does work with the original example and your extensions.
 

 And I thought that you would have known for sure.
   

i thought i did until you made that comment, which made me think you've
just discovered it doesn't.

   
 Though, it seems to require the knowledge of perl, or at least
 perl's regular expression.
   
 oh my, sorry.  it' so bad to go an inch out of the cosy world of r. 
 

 Well, if that's how you feel, don't do it.
   

quite the opposite.

 I regularly use other languages besides R.  Mostly C and Fortran,
 occasionally Python.  But I never found time to learn Perl or Java or
 awk or C++ or; some people do not have the time to learn all
 languages under the sun. Also, if one concentrates on a few, one can
 learn them really well.
   

i think i did not suggest the original poster to learn perl.

many responses on this list involve regular expressions, and regexes are
so ubiquitous in code that has to do with parsing and processing text,
be it filenames or loads of data, that a user of r may well want to
learn a bit of this stuff in addition to the details about how real
numbers are represented below the surface.

you may want to keep saying 'use r where applicable instead of worse
tools', and i'd like to keep saying 'use regexes where they're
applicable instead of worse tools'.  same philosophy.


   
 but, as gabor pointed, 'perl=TRUE' is inessential here,
 

 I thought that your answer to Gabor indicated that, depending on the
 context, perl=TRUE was essential; though I must admit that I did not
 run that code.  
   

i see i may have expressed that wrongly.  it should have been but maybe
you'd want to keep it, meaning this is not essential for the final
result, but may improve the runtime.


   
 so you actually need to know just (very basic) regular expressions,
 with no 'perl' implied.  having learnt this simple regex syntax you
 can avoid the need for looking up strsplit and paste in tfm, so i'd
 consider it worthwhile.
 

 As people say, YMMV, I do not need to look up strsplit and/or paste;
 but I would have to look up what the regular expression syntax or
 finally memorise it; something I did not consider worthwhile so far.
   

well, the regex syntax is fairly standard, though variations exist among
languages.  once you learn it, or rather the ideas behind the syntax,
you're well equipped for quite a range of tasks.  on the other hand, the
details of strsplit and paste are pretty r-specific, and you don't gain
much by remembering them (except for freeing yourself from having to
read tfm again).

i have seen quite a bunch of programs written by scientists who spent
over one hundred lines of code on just parsing command line arguments; 
i wish they knew regexes exist (and better, getopt-like modules too). 
if you're doing serious programming without knowing regexes, you're
rather lucky.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Wacek Kusnierczyk
Prof Brian Ripley wrote:

 Actually, that's a valid regex in any of the variants offered.  A more
 conventional writing of it is the second of

 f - 'foo.bar.R'
 sub([.][^.]*$, , f)
 [1] foo.bar
 sub(\\.[^.]*$, , f)
 [1] foo.bar


more conventional in r, perhaps.  it's not portable, due to the 'escape
the escape to have an escape' feature of r when it comes to regexes; in
perl, for example, /\\.[^.]*$/ would hardly do the job.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Berwin A Turlach
G'day Wacek,

On Fri, 09 Jan 2009 15:19:46 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 i think i did not suggest the original poster to learn perl.

As I see it, you didn't suggest anything to the original poster, at
least not directly.  But, since these days you have to be subscribed
to r-help to post IIRC, it is probably reasonable to assume that the
original poster saw your posting.
 
 many responses on this list involve regular expressions, and regexes
 are so ubiquitous in code that has to do with parsing and processing
 text, be it filenames or loads of data, that a user of r may well
 want to learn a bit of this stuff in addition to the details about
 how real numbers are represented below the surface.

You should really get rid of that chip on your shoulder.

 you may want to keep saying 'use r where applicable instead of worse
 tools', 

Now I am getting really confused, isn't your suggested solution not
using R too?  So why would I say something like this?  I suggest you get
rid of the chip on the other shoulder too... :)

 well, the regex syntax is fairly standard, though variations exist
 among languages.  

Exactly, and the existence of these variations which can trip one up
and require to rtfm in anything but the most simplest situations,
that's why regexp are not what jumps first to my mind.

 i have seen quite a bunch of programs written by scientists who spent
 over one hundred lines of code on just parsing command line
 arguments; i wish they knew regexes exist (and better, getopt-like
 modules too). if you're doing serious programming without knowing
 regexes, you're rather lucky.

Probably depends on what one calls serious.

Best wishes,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Henrique Dallazuanna
Right,

Other option is:



substr(nameFile, 1, tail(unlist(gregexpr(\\., nameFile)), 1) - 1)

On Fri, Jan 9, 2009 at 1:23 PM, Rau, Roland r...@demogr.mpg.de wrote:

 Hi,

  [mailto:r-help-boun...@r-project.org] On Behalf Of Henrique
  Dallazuanna
 
  Try this also:
 
  substr(basename(myfile), 1, nchar(basename(myfile)) - 4)
 

 This, of course, assumes that the extensions are always 3 characters.
 Sometimes there might be more (index.html), sometimes less
 (shellscript.sh).

 Although my solution is not as compact as the others (I wish I was
 proficient in 'mastering regular expressions'), I'd like to provide my
 little code-snippet which does not require any regular expressions (but
 expects a . in the filename).

 ##
 x1 - roland.txt
 x2 - roland.html
 x3 - roland.sh

 no.extension - function(astring) {
  if (substr(astring, nchar(astring), nchar(astring))==.) {
return(substr(astring, 1, nchar(astring)-1))
  } else {
no.extension(substr(astring, 1, nchar(astring)-1))
  }
 }

 no.extension(x1)
 no.extension(x2)
 no.extension(x3)
 ##

 Hope this helps a bit,
 Roland

 P.S. Any suggestions how to become more proficient with regular
 expressions? The O'Reilly book (Mastering...)? Whenever I tried
 anything more complicated than basic usage (things like ^ $ * . ) in R,
 I was way faster to write a new function (like above) instead of finding
 a regex solution.

 By the way: it might be still possible to *write* regular expressions,
 but what about code re-use? Are there people who can easily *read*
 complicated regular expressions?

 --
 This mail has been sent through the MPI for Demographic Research.  Should
 you receive a mail that is apparently from a MPI user without this text
 displayed, then the address has most likely been faked. If you are uncertain
 about the validity of this message, please check the mail header or ask your
 system administrator for assistance.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Duncan Murdoch

On 1/8/2009 9:10 PM, Gundala Viswanath wrote:

Dear all,

The basename() function returns the extension also:


myfile - path1/path2/myoutput.txt
basename(myfile)

[1] myoutput.txt


Is there any other function where it just returns
plain base:

myoutput

i.e. without 'txt'


I'm curious about something: does file extension have a standard 
definition?  Most (all?  I haven't tried them all) of the solutions 
presented in this thread would return an empty string for the plain 
base if given the filename .bashrc.


Windows (where file extensions really mean something), though reluctant 
to create such a file, appears to agree that the extension is bashrc, 
even though to me it appears clear that that file has no extension.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Rau, Roland
Hi, 

 [mailto:r-help-boun...@r-project.org] On Behalf Of Henrique 
 Dallazuanna
 
 Try this also:
 
 substr(basename(myfile), 1, nchar(basename(myfile)) - 4)
 

This, of course, assumes that the extensions are always 3 characters.
Sometimes there might be more (index.html), sometimes less
(shellscript.sh).

Although my solution is not as compact as the others (I wish I was
proficient in 'mastering regular expressions'), I'd like to provide my
little code-snippet which does not require any regular expressions (but
expects a . in the filename).

##
x1 - roland.txt
x2 - roland.html
x3 - roland.sh

no.extension - function(astring) {
  if (substr(astring, nchar(astring), nchar(astring))==.) {
return(substr(astring, 1, nchar(astring)-1))
  } else {
no.extension(substr(astring, 1, nchar(astring)-1))
  }
}

no.extension(x1)
no.extension(x2)
no.extension(x3)
##

Hope this helps a bit,
Roland

P.S. Any suggestions how to become more proficient with regular
expressions? The O'Reilly book (Mastering...)? Whenever I tried
anything more complicated than basic usage (things like ^ $ * . ) in R,
I was way faster to write a new function (like above) instead of finding
a regex solution.

By the way: it might be still possible to *write* regular expressions,
but what about code re-use? Are there people who can easily *read*
complicated regular expressions?

--
This mail has been sent through the MPI for Demographic Research.  Should you 
receive a mail that is apparently from a MPI user without this text displayed, 
then the address has most likely been faked. If you are uncertain about the 
validity of this message, please check the mail header or ask your system 
administrator for assistance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Gabor Grothendieck
On Fri, Jan 9, 2009 at 10:23 AM, Rau, Roland r...@demogr.mpg.de wrote:
 Hi,

 [mailto:r-help-boun...@r-project.org] On Behalf Of Henrique
 Dallazuanna

 Try this also:

 substr(basename(myfile), 1, nchar(basename(myfile)) - 4)


 This, of course, assumes that the extensions are always 3 characters.

The regex solutions posted did not make this assumption.


 P.S. Any suggestions how to become more proficient with regular
 expressions? The O'Reilly book (Mastering...)? Whenever I tried
 anything more complicated than basic usage (things like ^ $ * . ) in R,
 I was way faster to write a new function (like above) instead of finding
 a regex solution.

See the links in the Links box at:
http://gsubfn.googlecode.com




 By the way: it might be still possible to *write* regular expressions,
 but what about code re-use? Are there people who can easily *read*
 complicated regular expressions?

 --
 This mail has been sent through the MPI for Demographic Research.  Should you 
 receive a mail that is apparently from a MPI user without this text 
 displayed, then the address has most likely been faked. If you are uncertain 
 about the validity of this message, please check the mail header or ask your 
 system administrator for assistance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Peter Dalgaard
Duncan Murdoch wrote:
 On 1/8/2009 9:10 PM, Gundala Viswanath wrote:
 Dear all,

 The basename() function returns the extension also:

 myfile - path1/path2/myoutput.txt
 basename(myfile)
 [1] myoutput.txt


 Is there any other function where it just returns
 plain base:

 myoutput

 i.e. without 'txt'
 
 I'm curious about something: does file extension have a standard
 definition?  Most (all?  I haven't tried them all) of the solutions
 presented in this thread would return an empty string for the plain
 base if given the filename .bashrc.
 
 Windows (where file extensions really mean something), though reluctant
 to create such a file, appears to agree that the extension is bashrc,
 even though to me it appears clear that that file has no extension.

I'm not sure what is clear about it, but the GNU utility agrees with you:

basename abc/.exe .exe
.exe
basename abc/1.exe .exe
1

Anyone want to contribute code for an optional suffix= argument for R's
basename()?

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Marc Schwartz
on 01/09/2009 09:00 AM Duncan Murdoch wrote:
 On 1/8/2009 9:10 PM, Gundala Viswanath wrote:
 Dear all,

 The basename() function returns the extension also:

 myfile - path1/path2/myoutput.txt
 basename(myfile)
 [1] myoutput.txt


 Is there any other function where it just returns
 plain base:

 myoutput

 i.e. without 'txt'
 
 I'm curious about something: does file extension have a standard
 definition?  Most (all?  I haven't tried them all) of the solutions
 presented in this thread would return an empty string for the plain
 base if given the filename .bashrc.
 
 Windows (where file extensions really mean something), though reluctant
 to create such a file, appears to agree that the extension is bashrc,
 even though to me it appears clear that that file has no extension.

Duncan,

That is going to be highly OS and even OS version specific. More
information here:

  http://en.wikipedia.org/wiki/Filename
  http://en.wikipedia.org/wiki/Filename_extension

There are relevant standard extensions for standard file formats:

  http://en.wikipedia.org/wiki/List_of_file_formats

but that does not guarantee that user created filenames will adhere to
them, especially for text files.

As you note, filenames beginning with a '.' will be common on
Unixen/Linuxen as otherwise normally hidden system/config files. Such
files would actually create problems if attempted to be opened on
Windows with certain applications and I have even seen problems with
such files when using SMB under Linux to access files on a server.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Wacek Kusnierczyk

 Duncan Murdoch wrote:
   

 I'm curious about something: does file extension have a standard
 definition?  Most (all?  I haven't tried them all) of the solutions
 presented in this thread would return an empty string for the plain
 base if given the filename .bashrc.
 


right;  there's a straightforward fix to my solution that accounts for
cases such as '.bashrc':

names = c(foo.bar, .zee)
sub((.+)[.][^.]+$, \\1, names)

you could also use a lookbehind if possible (not in r, afaik).

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Wacek Kusnierczyk
Rau, Roland wrote:

 P.S. Any suggestions how to become more proficient with regular
 expressions? The O'Reilly book (Mastering...)? Whenever I tried
 anything more complicated than basic usage (things like ^ $ * . ) in R,
 I was way faster to write a new function (like above) instead of finding
 a regex solution.
   

the book you mention is good.
you may also consider http://www.regular-expressions.info/

regexes are usually well explained with lots of examples in perl books.

 By the way: it might be still possible to *write* regular expressions,
 but what about code re-use? Are there people who can easily *read*
 complicated regular expressions?
   


in some cases it is possible to write regular expressions in a way that
facilitates reading them by a human.  in perl, for example, you can use
so-called readable regexes:

/
   (.+)# match and remember at least one arbitrary character
   [.] # match a dot
   [^.]+ # match at least one non-dot character
   $  # end of string anchor
/x;

you can also use within regex comments:

/(.+)(?# one or more chars)[.](?# a dot)[^.]+(?# one or more
non-dots)$(?# end of string)/


nothing of the sorts in r, however.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Gabor Grothendieck
On Fri, Jan 9, 2009 at 4:20 PM, Wacek Kusnierczyk
waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 Duncan Murdoch wrote:


 I'm curious about something: does file extension have a standard
 definition?  Most (all?  I haven't tried them all) of the solutions
 presented in this thread would return an empty string for the plain
 base if given the filename .bashrc.



 right;  there's a straightforward fix to my solution that accounts for
 cases such as '.bashrc':

 names = c(foo.bar, .zee)
 sub((.+)[.][^.]+$, \\1, names)

 you could also use a lookbehind if possible (not in r, afaik).


or:

 sub(.*[.], ., names)
[1] .bar .zee

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-09 Thread Gabor Grothendieck
On Fri, Jan 9, 2009 at 4:28 PM, Wacek Kusnierczyk
waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 Rau, Roland wrote:

 P.S. Any suggestions how to become more proficient with regular
 expressions? The O'Reilly book (Mastering...)? Whenever I tried
 anything more complicated than basic usage (things like ^ $ * . ) in R,
 I was way faster to write a new function (like above) instead of finding
 a regex solution.


 the book you mention is good.
 you may also consider http://www.regular-expressions.info/

 regexes are usually well explained with lots of examples in perl books.

 By the way: it might be still possible to *write* regular expressions,
 but what about code re-use? Are there people who can easily *read*
 complicated regular expressions?



 in some cases it is possible to write regular expressions in a way that
 facilitates reading them by a human.  in perl, for example, you can use
 so-called readable regexes:

 /
   (.+)# match and remember at least one arbitrary character
   [.] # match a dot
   [^.]+ # match at least one non-dot character
   $  # end of string anchor
 /x;

 you can also use within regex comments:

 /(.+)(?# one or more chars)[.](?# a dot)[^.]+(?# one or more
 non-dots)$(?# end of string)/


 nothing of the sorts in r, however.

Supports that if you begin the regular expression with (?x) and
use perl = TRUE.  See ?regexp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting File Basename without Extension

2009-01-08 Thread Gundala Viswanath
Dear all,

The basename() function returns the extension also:

 myfile - path1/path2/myoutput.txt
 basename(myfile)
[1] myoutput.txt


Is there any other function where it just returns
plain base:

myoutput

i.e. without 'txt'

- Gundala Viswanath
Jakarta - Indonesia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting File Basename without Extension

2009-01-08 Thread jim holtman
You can use 'sub' to get rid of the extensions:

 sub(^([^.]*).*, \\1, 'filename.extension')
[1] filename
 sub(^([^.]*).*, \\1, 'filename.extension.and.more')
[1] filename
 sub(^([^.]*).*, \\1, 'filename without extension')
[1] filename without extension


On Thu, Jan 8, 2009 at 9:10 PM, Gundala Viswanath gunda...@gmail.com wrote:
 Dear all,

 The basename() function returns the extension also:

 myfile - path1/path2/myoutput.txt
 basename(myfile)
 [1] myoutput.txt


 Is there any other function where it just returns
 plain base:

 myoutput

 i.e. without 'txt'

 - Gundala Viswanath
 Jakarta - Indonesia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.