[ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun updated SPARK-8629: ------------------------ Description: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month Year SalesQuantity Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/1/2013 2-Feb 2013 2 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: > data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) >factors <- unique(data$ItemNo) > df.allitems <- data.frame() > for(i in 1:length(factors)) { >data1 <- filter(data, ItemNo == factors[[i]]) >data2<select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) >date2$date <- as.Date(date2$date, format = "%m/%d/%y") >data3 <- data2[order(data2$date), ] df.allitems <- rbind(data3 , df.allitems) # Append by row bind } > write.csv(df.allitems,"E:/all_items.csv") ------------------------------------------------------------------------------- I have done some SparkR code: data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors <- distinct(df_1) # removed duplicates #for select i used: df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity") # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. was: Data set: DC_City Dc_Code ItemNo Itemdescription dat Month Year SalesQuantity Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1 Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1 Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1 Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/1/2013 2-Feb 2013 2 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1 Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18 Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39 Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23 Code i used in R: > data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) >factors <- unique(data$ItemNo) > df.allitems <- data.frame() > for(i in 1:length(factors)) { data1 <- filter(data, ItemNo == factors[[i]]) data2<select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns date2$date <- as.Date(date2$date, format = "%m/%d/%y") # format the date data3 <- data2[order(data2$date), ] # order by assending df.allitems <- rbind(data3 , df.allitems) # Append by row bind } > write.csv(df.allitems,"E:/all_items.csv") ------------------------------------------------------------------------------- I have done some SparkR code: data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF factors <- distinct(df_1) # removed duplicates #for select i used: df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity") # select action I dont know how to: 1) create a empty sparkR DF 2) Using for loop in SparkR 3) change the date format. 4) find the lenght() in spark df 5) using rbind in sparkR can you help me out in doing the above code in sparkR. > R code in SparkR > ---------------- > > Key: SPARK-8629 > URL: https://issues.apache.org/jira/browse/SPARK-8629 > Project: Spark > Issue Type: Question > Components: R > Reporter: Arun > Priority: Minor > > Data set: > > DC_City Dc_Code ItemNo Itemdescription dat > Month Year SalesQuantity > Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. > 9/16/2012 9-Sep 2012 1 > Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. > 12/21/2012 12-Dec2012 1 > Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. > 1/12/2013 1-Jan 2013 1 > Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. > 1/27/2013 1-Jan 2013 3 > Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. > 2/1/2013 2-Feb 2013 2 > Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. > 2/12/2013 2-Feb 2013 3 > Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. > 2/13/2013 2-Feb 2013 2 > Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. > 2/14/2013 2-Feb 2013 1 > Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. > 2/15/2013 2-Feb 2013 8 > Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. > 2/16/2013 2-Feb 2013 18 > Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. > 2/17/2013 2-Feb 2013 19 > Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. > 2/18/2013 2-Feb 2013 18 > Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. > 2/19/2013 2-Feb 2013 18 > Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. > 2/20/2013 2-Feb 2013 16 > Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. > 2/21/2013 2-Feb 2013 25 > Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. > 2/22/2013 2-Feb 2013 19 > Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. > 2/23/2013 2-Feb 2013 17 > Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. > 2/24/2013 2-Feb 2013 39 > Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. > 2/25/2013 2-Feb 2013 23 > Code i used in R: > > data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) > >factors <- unique(data$ItemNo) > > df.allitems <- data.frame() > > for(i in 1:length(factors)) > { > >data1 <- filter(data, ItemNo == factors[[i]]) > >data2<select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) > >date2$date <- as.Date(date2$date, format = "%m/%d/%y") > >data3 <- data2[order(data2$date), ] > df.allitems <- rbind(data3 , df.allitems) # Append by row bind > } > > > write.csv(df.allitems,"E:/all_items.csv") > ------------------------------------------------------------------------------- > > > I have done some SparkR code: > data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R > df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark > DF > factors <- distinct(df_1) # removed duplicates > > #for select i used: > df_2 <- select(distinctDF > ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity") # select > action > I dont know how to: > 1) create a empty sparkR DF > 2) Using for loop in SparkR > 3) change the date format. > 4) find the lenght() in spark df > 5) using rbind in sparkR > > can you help me out in doing the above code in sparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org