Re: Spark Avarage

2015-04-06 Thread baris akgun
Thanks for your replies I solved the problem with this code

val weathersRDD = sc.textFile(csvfilePath).map {
  line =
val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
line.replaceAll(\,).trim.split(,)
Tuple2(dayOfdate.substring(0,7), (minDeg.toInt, maxDeg.toInt,
meanDeg.toInt))
}.mapValues(x = (x, 1)).reduceByKey((x, y) = ((x._1._1 + y._1._1,
x._1._2 + y._1._2,x._1._3 + y._1._3),x._2 + y._2))
.mapValues{ case ((sumMin,sumMax,sumMean), count) = ((1.0 *
sumMin)/count , (1.0 * sumMax)/count, (1.0 * sumMean)/count)
}.collectAsMap()


but I will also try Dataframe API

thanks again



2015-04-06 13:31 GMT-04:00 Cheng, Hao hao.ch...@intel.com:

 The Dataframe API should be perfectly helpful in this case.
 https://spark.apache.org/docs/1.3.0/sql-programming-guide.html

 Some code snippet will like:

 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 // this is used to implicitly convert an RDD to a DataFrame.
 import sqlContext.implicits._
 weathersRDD.toDF.registerTempTable(weathers)
 val results = sqlContext.sql(SELECT avg(minDeg), avg(maxDeg),
 avg(meanDeg) FROM weathers GROUP BY dayToMonth(dayOfDate)))
 results.collect.foreach(println)


 -Original Message-
 From: barisak [mailto:baris.akg...@gmail.com]
 Sent: Monday, April 6, 2015 10:50 PM
 To: user@spark.apache.org
 Subject: Spark Avarage

 Hi

 I have a class in above desc.

 case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int,
 meanDeg:
 Int)

 I am reading the data from csv file and I put this data into weatherCond
 class with this code

 val weathersRDD = sc.textFile(weather.csv).map {
   line =
 val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
 line.replaceAll(\,).trim.split(,)
 weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt)
 }

 the question is ; how can I average the minDeg, maxDeg and meanDeg values
 for each month ;

 The data set example

 day, min, max , mean
 2014-03-17,-3,5,5
 2014-03-18,6,7,7
 2014-03-19,6,14,10

 result has to be (2014-03,   3,   8.6   ,7.3) -- (Average for 2014 - 03
 )

 Thanks





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
 commands, e-mail: user-h...@spark.apache.org




RE: Spark Avarage

2015-04-06 Thread Cheng, Hao
The Dataframe API should be perfectly helpful in this case.  
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html

Some code snippet will like:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
weathersRDD.toDF.registerTempTable(weathers)
val results = sqlContext.sql(SELECT avg(minDeg), avg(maxDeg), avg(meanDeg) 
FROM weathers GROUP BY dayToMonth(dayOfDate)))
results.collect.foreach(println)


-Original Message-
From: barisak [mailto:baris.akg...@gmail.com] 
Sent: Monday, April 6, 2015 10:50 PM
To: user@spark.apache.org
Subject: Spark Avarage

Hi 

I have a class in above desc.

case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, meanDeg:
Int)

I am reading the data from csv file and I put this data into weatherCond class 
with this code 

val weathersRDD = sc.textFile(weather.csv).map {
  line =
val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
line.replaceAll(\,).trim.split(,)
weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt)
}

the question is ; how can I average the minDeg, maxDeg and meanDeg values for 
each month ; 

The data set example 

day, min, max , mean
2014-03-17,-3,5,5
2014-03-18,6,7,7
2014-03-19,6,14,10

result has to be (2014-03,   3,   8.6   ,7.3) -- (Average for 2014 - 03
)

Thanks





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Avarage

2015-04-06 Thread barisak
Hi 

I have a class in above desc.

case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, meanDeg:
Int)

I am reading the data from csv file and I put this data into weatherCond
class with this code 

val weathersRDD = sc.textFile(weather.csv).map {
  line =
val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
line.replaceAll(\,).trim.split(,)
weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt)
}

the question is ; how can I average the minDeg, maxDeg and meanDeg values
for each month ; 

The data set example 

day, min, max , mean
2014-03-17,-3,5,5
2014-03-18,6,7,7
2014-03-19,6,14,10

result has to be (2014-03,   3,   8.6   ,7.3) -- (Average for 2014 - 03
)

Thanks





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Avarage

2015-04-06 Thread Yana Kadiyska
If you're going to do it this way, I would ouput dayOfdate.substring(0,7),
i.e. the month part, and instead of weatherCond, you can use
(month,(minDeg,maxDeg,meanDeg)) --i.e. PairRDD. So weathersRDD:
RDD[(String,(Double,Double,Double))]. Then use a reduceByKey as shown in
multiple Spark examples..You'd end up with the sum for each metric and in
the end divide by the count to get the avg of each column. If you want to
use Algebird you can output (month,(Avg(minDeg),Avg(maxDeg),Avg(meanDeg)))
and then all your reduce operations would be _+_.

With that said, if you're using spark 1.3 check out
https://github.com/databricks/spark-csv (you should likely use the CSV
package anyway, even with a lower version of Spark) and
https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.DataFrame
(esp. the example at the top of the file). You'd just need .groupByand .agg
if you setup your dataframe column that you're grouping by to contain just
the -MM portion of your date string.

On Mon, Apr 6, 2015 at 10:50 AM, barisak baris.akg...@gmail.com wrote:

 Hi

 I have a class in above desc.

 case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int,
 meanDeg:
 Int)

 I am reading the data from csv file and I put this data into weatherCond
 class with this code

 val weathersRDD = sc.textFile(weather.csv).map {
   line =
 val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
 line.replaceAll(\,).trim.split(,)
 weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt)
 }

 the question is ; how can I average the minDeg, maxDeg and meanDeg values
 for each month ;

 The data set example

 day, min, max , mean
 2014-03-17,-3,5,5
 2014-03-18,6,7,7
 2014-03-19,6,14,10

 result has to be (2014-03,   3,   8.6   ,7.3) -- (Average for 2014 - 03
 )

 Thanks





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org