Re: Spark Avarage
Thanks for your replies I solved the problem with this code val weathersRDD = sc.textFile(csvfilePath).map { line = val Array(dayOfdate, minDeg, maxDeg, meanDeg) = line.replaceAll(\,).trim.split(,) Tuple2(dayOfdate.substring(0,7), (minDeg.toInt, maxDeg.toInt, meanDeg.toInt)) }.mapValues(x = (x, 1)).reduceByKey((x, y) = ((x._1._1 + y._1._1, x._1._2 + y._1._2,x._1._3 + y._1._3),x._2 + y._2)) .mapValues{ case ((sumMin,sumMax,sumMean), count) = ((1.0 * sumMin)/count , (1.0 * sumMax)/count, (1.0 * sumMean)/count) }.collectAsMap() but I will also try Dataframe API thanks again 2015-04-06 13:31 GMT-04:00 Cheng, Hao hao.ch...@intel.com: The Dataframe API should be perfectly helpful in this case. https://spark.apache.org/docs/1.3.0/sql-programming-guide.html Some code snippet will like: val sqlContext = new org.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame. import sqlContext.implicits._ weathersRDD.toDF.registerTempTable(weathers) val results = sqlContext.sql(SELECT avg(minDeg), avg(maxDeg), avg(meanDeg) FROM weathers GROUP BY dayToMonth(dayOfDate))) results.collect.foreach(println) -Original Message- From: barisak [mailto:baris.akg...@gmail.com] Sent: Monday, April 6, 2015 10:50 PM To: user@spark.apache.org Subject: Spark Avarage Hi I have a class in above desc. case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, meanDeg: Int) I am reading the data from csv file and I put this data into weatherCond class with this code val weathersRDD = sc.textFile(weather.csv).map { line = val Array(dayOfdate, minDeg, maxDeg, meanDeg) = line.replaceAll(\,).trim.split(,) weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt) } the question is ; how can I average the minDeg, maxDeg and meanDeg values for each month ; The data set example day, min, max , mean 2014-03-17,-3,5,5 2014-03-18,6,7,7 2014-03-19,6,14,10 result has to be (2014-03, 3, 8.6 ,7.3) -- (Average for 2014 - 03 ) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Spark Avarage
The Dataframe API should be perfectly helpful in this case. https://spark.apache.org/docs/1.3.0/sql-programming-guide.html Some code snippet will like: val sqlContext = new org.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame. import sqlContext.implicits._ weathersRDD.toDF.registerTempTable(weathers) val results = sqlContext.sql(SELECT avg(minDeg), avg(maxDeg), avg(meanDeg) FROM weathers GROUP BY dayToMonth(dayOfDate))) results.collect.foreach(println) -Original Message- From: barisak [mailto:baris.akg...@gmail.com] Sent: Monday, April 6, 2015 10:50 PM To: user@spark.apache.org Subject: Spark Avarage Hi I have a class in above desc. case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, meanDeg: Int) I am reading the data from csv file and I put this data into weatherCond class with this code val weathersRDD = sc.textFile(weather.csv).map { line = val Array(dayOfdate, minDeg, maxDeg, meanDeg) = line.replaceAll(\,).trim.split(,) weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt) } the question is ; how can I average the minDeg, maxDeg and meanDeg values for each month ; The data set example day, min, max , mean 2014-03-17,-3,5,5 2014-03-18,6,7,7 2014-03-19,6,14,10 result has to be (2014-03, 3, 8.6 ,7.3) -- (Average for 2014 - 03 ) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Avarage
Hi I have a class in above desc. case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, meanDeg: Int) I am reading the data from csv file and I put this data into weatherCond class with this code val weathersRDD = sc.textFile(weather.csv).map { line = val Array(dayOfdate, minDeg, maxDeg, meanDeg) = line.replaceAll(\,).trim.split(,) weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt) } the question is ; how can I average the minDeg, maxDeg and meanDeg values for each month ; The data set example day, min, max , mean 2014-03-17,-3,5,5 2014-03-18,6,7,7 2014-03-19,6,14,10 result has to be (2014-03, 3, 8.6 ,7.3) -- (Average for 2014 - 03 ) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Avarage
If you're going to do it this way, I would ouput dayOfdate.substring(0,7), i.e. the month part, and instead of weatherCond, you can use (month,(minDeg,maxDeg,meanDeg)) --i.e. PairRDD. So weathersRDD: RDD[(String,(Double,Double,Double))]. Then use a reduceByKey as shown in multiple Spark examples..You'd end up with the sum for each metric and in the end divide by the count to get the avg of each column. If you want to use Algebird you can output (month,(Avg(minDeg),Avg(maxDeg),Avg(meanDeg))) and then all your reduce operations would be _+_. With that said, if you're using spark 1.3 check out https://github.com/databricks/spark-csv (you should likely use the CSV package anyway, even with a lower version of Spark) and https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.DataFrame (esp. the example at the top of the file). You'd just need .groupByand .agg if you setup your dataframe column that you're grouping by to contain just the -MM portion of your date string. On Mon, Apr 6, 2015 at 10:50 AM, barisak baris.akg...@gmail.com wrote: Hi I have a class in above desc. case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, meanDeg: Int) I am reading the data from csv file and I put this data into weatherCond class with this code val weathersRDD = sc.textFile(weather.csv).map { line = val Array(dayOfdate, minDeg, maxDeg, meanDeg) = line.replaceAll(\,).trim.split(,) weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt) } the question is ; how can I average the minDeg, maxDeg and meanDeg values for each month ; The data set example day, min, max , mean 2014-03-17,-3,5,5 2014-03-18,6,7,7 2014-03-19,6,14,10 result has to be (2014-03, 3, 8.6 ,7.3) -- (Average for 2014 - 03 ) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org