What factor variables are
A "factor" is a vector whose elements can take on one of a specific set of 
values. For example, "Sex" will usually take on only the values "M" or "F," 
whereas "Name" will generally have lots of possibilities. The set of values 
that the elements of a factor can take are called its levels. If you want to 
add a new level to a factor, you can do that, but you can't just change 
elements to have new values that aren't already levels. Here's an example. I'll 
start by creating a factor whose values are "a", "b", and "c." The factor() 
function will do this, and it will generate the labels automatically. 
> a <- factor (c("a", "b", "c", "b", "c", "b", "a", "c")) # create the factor
> a                     # Print the new variable
[1] a b c b c b a c     # You can tell it's not a character vector: no quotes
> levels(a)             # Here is the set of levels
[1] "a" "b" "c"
#
# What if I try to change an element to a new value, like "d"?
#
> a[3] <- "d"
Warning messages:
  Warning in "[<-.factor"(a, 3, value = "d"): replacement values not all in 
levels(x): NA's
 generated
#
# The warning message tells you that some NAs have geen generated.
#
> a
[1] a  b  NA b  c  b  a  c 
#
# However it's okay to set elements to values that are already levels:
#
> a[3] <- "a"
> a
[1] a b a b c b a c
#
# It's also easy to change levels. Here I'll change the "a"'s to "AA". Notice 
that
# I don't change the values themselves, just the levels.
#
> levels(a)
[1] "a" "b" "c"
> levels(a)[1] <- "AA"
> a
[1] AA b  AA b  c  b  AA c 
#
# The general way to convert a factor to character is with as.character():
#
> as.character(a)
[1] "AA" "b"  "NA" "b"  "c"  "b"  "AA" "c" 
#
# Note the "NA" is a regular string, not a missing value.
#

By default the levels are the unique data values sorted alphabetically. This 
turns out to matter in some statistical models. You can reorder the levels if 
you want. 
Internal Storage and Extra LevelsFactor variables are stored, internally, as 
numeric variables together with their levels. The actual values of the numeric 
variable are 1, 2, and so on. Not every level has to appear in the vector. In 
this example I create a factor variable with four levels, even though I only 
actually have data in three of them. 
> a <- factor (c(1, 2, 3, 2, 3, 2, 1), levels=1:4, labels=c("Small", "Medium", 
> "Large", "Huge")) # Create it
#
# In this example, the "levels=1:4" is required. Otherwise the mismatch between 
the fact that 
# there are four labels but only three values will get you in trouble. Of 
course the values in "levels" 
# need to match the values in the data.
#
> a
[1] Small  Medium Large  Medium Large  Medium Small 
Levels:
[1] "Small"  "Medium" "Large"  "Huge"  
#
# Notice how the levels (including "Huge") print out. In general the levels 
will print out whenever they 
# don't all appear in whatever's being printed.
#
# Take a look at the table of a. The "Huge" level is remembered.
#
> table (a)
 Small Medium Large Huge 
     2      3     2    0
This happens in the GUI when you try to change one value to another. If you try 
to change "a" to "b" but accidentally type "ab," a level named "ab" is created. 
If you then correct the "ab" to "b," the "ab" level remains. By the way, you 
can get rid of unused levels when subscripting by using the drop=T argument: 
> table (a[,drop=T])
 Small Medium Large 
     2      3     2
Missing values in factorsMissing values in factor variables can be a drag. 
They're invisible to the table() function, even when you use the exclude=NULL 
argument that is supposed to work here. (Reading the help file carefully tells 
us that S-Insightful knows this is a problem but hasn't fixed it.) The 
na.include() function will add a new level named "NA" to a factor with NAs in 
it. 
> a[3] <- NA # Make one entry NA
> a          # Sure enough
[1] Small  Medium NA     Medium Large  Medium Small 
Levels:
[1] "Small"  "Medium" "Large"  "Huge"  
#
# It's still missing from table(), even with exclude=NULL
#
> table (a, exclude=NULL)
 Small Medium Large Huge 
     2      3     1    0
> sum (is.na (a))              # How many NAs are there in this vector?
> [1] 1                        # Answer: 1
#
# Now run na.include(a) and save the result
#
> aa <- na.include(a)
> table (aa)
 Small Medium Large Huge NA 
     2      3     1    0  1
> sum (is.na (aa))              # How many NA's here?
> [1] 0

Q: Why is my character variable a factor?When you construct a data.frame with 
read.table() or by importing, the default decision is to turn every character 
variable into a factor. This may or may not be a good idea for you (see "When 
do I need a factor variable?" below). If you don't want factors, use the 
as.isargument to read.table(). A single T says "leave everything as is": a 
vector of T's and F's results in the conversion of all the columns for which 
the as.isargument is F. If you're using File | Import Data, go to the Options 
tab and uncheck "Strings as factors." If you want some of your character 
variables to be characters, and others to be factors, you'll need to use 
read.table(). 
Q: Why is my numeric variable a factor?This usually happens when your "numeric" 
variable actually contains some non-numeric entries (like "NA" or "Missing" or 
an empty space). S-Plus sees that the column is not numeric, so it treats it as 
if it were character, and factorizes it (see the preceding paragraph). If you 
don't mind a few warnings, you can convert a column this has happened to into 
numeric in the following way. Suppose your data frame is named Steve and the 
column is G. Then this line converts the entries in Steve$G to numeric, where 
possible. Non-numeric entries in G will be turned into NAs. 
>Steve$G <- as.numeric(levels(Steve$G)[Steve$G]) # or <- 
>as.numeric(as.character(Steve$G))Remember that, internally, Steve$G is 
>numeric. So indexing something by Steve$G is certainly possible. 
Watch out! Here's something to watch out for. If your numeric gets converted to 
factor, then the levels will be what you want. The internal representation, the 
numbers 1, 2, and so on which S-Plus uses to keep track of things, will 
generally not be what you want. The reason is that by default level 1 gets 
assigned to the first value in alphabetical order, the second level to the 
second value, and so on. So suppose that your values are 8, 25, 111, and 
"Missing". When that gets imported, it will be recognized as character data. 
Then it will be converted to a factor, with levels corresponding to the values 
of these alphabetic values. Of course the alphabetic sorting scheme is 
different than the numeric one. Here's an example: 
> factor (c(1, 3, 17, 4, "NA", 5)) # Create and display a factor variable. The 
> whole vector
[1] 1  3  17 4  NA 5               # is converted to character before being 
factorized.
> as.numeric (factor (c(1, 3, 17, 4, "NA", 5)))
[1]  1  3  2  4 NA  5
In that example, the character string "17" comes between "1" and "2" (just as 
"Ag" comes between "A" and "B") and so the "17" gets level 2. The as.numeric() 
function converts the factor into its level numbers. That's probably not what 
you wanted. When do I want a factor variable?Factor variables are useful in 
several places. First, some S-Plus functions that expect factors fail when 
given a character vector. (However, these are rare. Generally the modeling 
functions will convert character vectors to factors invisibly.) Second, it's 
sometimes handy to carry the set of levels around with you. Suppose you have a 
factor vector with four levels. Then table() is guaranteed to produce a 
four-entry table, whether you operate on the whole vector or on any subset. In 
contrast, that operation on a character vector will produce only as many 
entries in the table as there are unique elements in the subset. So if you're 
planning to compare the districution of subsets,
 you'll want a factor. Third, for huge data sets factor variables will 
generally be smaller, since each observation is stored as an integer and the 
levels are only stored once. 
When are factor variables a big pain?Factor variables are a pain when you're 
cleaning your data because they're hard to update. My approach has always been 
to convert the variable to character with as.character(), then handle the 
variable as a character vector, and then convert it to factor (using factor() 
or as.factor()) at the end. 
What's one secret to converting factors to character vectors in a data 
frame?Here's an interesting fact. Remember how you can refer to columns of a 
data frame either in matrix style or in list style? When you use the 
matrix-style notation S-Plus will often factorize your character variables 
automatically. That's not true for list-style notation, so list-style is often 
what you want. Here's an example: 
#
# Create a simple data frame.
#
> d <- data.frame (a = c("a", "b", "c", "d"))
#
# The data.frame() function automatically converts characters to factors.
#
> is.factor (d[,1])
[1] T
#
# This should convert it back, but it doesn't.
#
> d[,1] <- as.character(d[,1])
> is.factor (d[,1])
[1] T
#
# This looks the same, but using list notation on the left makes all the 
difference.
#
> d$a <- as.character(d[,1])
> is.factor (d[,1])
[1] F
#
# This would have worked: the I() function ("I" standing for "identity") says
# "leave this just as I give it to you; don't convert it".
#
> d[,1] <- I(as.character(d[,1]))

Reordering the levels of a factorThis question arises in some models. The first 
level is set to be the baseline in the usual "treatment contrasts" setup (see 
the discussion of contrasts.) Sometimes it's desireable to have a different 
level be the baseline. To do that, convert the vector to character, then call 
factor()passing the new levels in the levels=argument. The result will look 
like the original; only the ordering of the levels will have changed. For 
example: 
> a <- factor (c("a", "b", "c", "b", "c", "b", "a", "c")) 
> a                    # Print a
[1] a b c b c b a c    # The table is produced in order of the levels
> table (a)
 a b c 
 2 3 3
#
# Convert a to character, then back to factor with a new vector of levels
#
> a <- factor (as.character(a), levels=c("c", "a", "b"))
> a                    # a is unchanged
[1] a b c b c b a c   
> table (a)            # The table is, too, but it's in a different order.
 c a b 
 3 2 3

Ordered FactorsCategoriesAn "ordered" factor is a factor whose levels have a 
particular order. Ordered variables inherit from factors, so anything that you 
can to a factor you can do to an ordered factor. S-Plus models generally ignore 
ordering even if you put it in there. The reorder.factor() can be useful; this 
creates an ordered factor by breaking some numeric variable down into subsets 
by the different levels of the factor, and then calculating the mean of each 
subset. Finally the levels are sorted in increasing order of these means. (You 
don't have to use the mean, but that's the default.) This can be useful when 
preparing plots. 
In practice I don't make much use of ordered factors. Before S-Plus had factors 
it had something similar called categories. The category object is 
"deprecated": that means don't use it. However, the result of calling the cut() 
function is still a category so you do run into these from time to time. My 
advice is to convert the results of a call to cut() to factor right away. 
Otherwise you may find that the small differences between factors and 
categories can come back to bite you. Here's an example: 
> x <- 1:20                             # Set up a vector...
> thing <- cut (x, c(0, 2, 5, 11, 21))  # ...pass it to cut
#
# The result of cut() is a category. This is a numeric vector with levels.
#
> thing
 [1] 1 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4
attr(, "levels"):
[1] " 0+ thru  2" " 2+ thru  5" " 5+ thru 11" "11+ thru 21"
#
# The levels print out, but the vector really is numeric. Notice that 
# the first three levels have a leading space.
#
# This looks right when you pass it to factor() or as.character()...
#
> as.character(thing)
 [1] " 0+ thru  2" " 0+ thru  2" " 2+ thru  5" " 2+ thru  5" " 2+ thru  5" " 5+ 
thru 11"
 [7] " 5+ thru 11" " 5+ thru 11" " 5+ thru 11" " 5+ thru 11" " 5+ thru 11" "11+ 
thru 21"
[13] "11+ thru 21" "11+ thru 21" "11+ thru 21" "11+ thru 21" "11+ thru 21" "11+ 
thru 21"
[19] "11+ thru 21" "11+ thru 21"
#
# ...but check this out. When you assign any element to anything that isn't
# in the levels, the whole vector is converted to numeric forever.
#
> thing[5] <- NA
> thing
 [1]  1  1  2  2 NA  3  3  3  3  3  3  4  4  4  4  4  4  4  4  4


 sandeep khokhar



----- Original Message ----
From: ~Rick <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, 1 May, 2008 10:52:23 PM
Subject: Re: [c-prog] need a sample programme

At Wednesday 4/30/2008 06:40 PM, you wrote:
> >I've already written this program, with a few differences. >Here's my
> >output. Can you tell me what grade your instructor will >give me?
> >Please enter a number (0 to quit): 115746.50
> >Your number was 115746.50
> >ONE HUNDRED FIFTEEN THOUSAND SEVEN >HUNDRED FORTY-SIX POINT FIVE ZERO
>maybe converting it to rupies, paes, and lakes helpes mask the project? ;)
>Thanks,
>~~TheCreator~~


I'm not exactly sure how to do Indian currency. I'm guessing 1 Lake = 
100,000 Rupees. I'm not going to change my code logic specifically 
for the OP. I changed the currency names, though. The OP (i.e. 
student) will need to create their own program, of course.

Please enter a number (0 to quit): 1,15,746.50
One Hundred Fifteen Thousand Seven Hundred Forty Six Rupees and Fifty Paise

~Rick

>Visit TDS for quality software and website production
>http://tysdomain.com
>msn: [EMAIL PROTECTED]
>skype: st8amnd127
>  ----- Original Message -----
>  From: ~Rick
>  To: [email protected]
>  Sent: Wednesday, April 30, 2008 4:36 PM
>  Subject: Re: [c-prog] need a sample programme
>
>
>  At Tuesday 4/29/2008 03:54 AM, you wrote:
>  >Dear friends have a nice time,
>  > i need sample programme in c to convert number into words.
>  >
>  >like:
>  >
>  >input : 1,15,746.50
>  >output: one lakes fifteen thousand seven hundred and forty six
>  >rupees and fifty paise
>  >i need your help.
>  >thanking you guys.
>
>  I've already written this program, with a few differences. Here's my
>  output. Can you tell me what grade your instructor will give me?
>
>  Please enter a number (0 to quit): 115746.50
>
>  Your number was 115746.50
>  ONE HUNDRED FIFTEEN THOUSAND SEVEN HUNDRED FORTY-SIX POINT FIVE ZERO
>
>  >
>  >---------------------------------
>  > Unlimited freedom, unlimited storage. Get it now
>  >
>  >[Non-text portions of this message have been removed]
>  >
>  >
>  >------------------------------------
>  >
>  >To unsubscribe, send a blank message to
>  ><mailto:[EMAIL PROTECTED]>.Yahoo! Groups Links
>  >
>  >
>  >
>
>
>
>
>[Non-text portions of this message have been removed]
>
>
>------------------------------------
>
>To unsubscribe, send a blank message to 
><mailto:[EMAIL PROTECTED]>.Yahoo! Groups Links
>
>
>

------------------------------------

To unsubscribe, send a blank message to <mailto:[EMAIL PROTECTED]>.Yahoo! 
Groups Links




      Unlimited freedom, unlimited storage. Get it now, on 
http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/

[Non-text portions of this message have been removed]

Reply via email to