Re: how to convert a sequence of TimeStamp to a dataframe

2015-08-03 Thread Michael Armbrust
In general it needs to be a Seq of Tuples for the implicit toDF to work
(which is a little tricky when there is only one column).

scala Seq(Tuple1(new
java.sql.Timestamp(System.currentTimeMillis))).toDF(a)
res3: org.apache.spark.sql.DataFrame = [a: timestamp]

or with multiple columns

scala Seq((1, new
java.sql.Timestamp(System.currentTimeMillis))).toDF(a, b)
res4: org.apache.spark.sql.DataFrame = [a: string, b: timestamp]

On Fri, Jul 31, 2015 at 2:50 PM, Joanne Contact joannenetw...@gmail.com
wrote:

 Hi Guys,

 I have struggled for a while on this seeming simple thing:

 I have a sequence of timestamps and want to create a dataframe with 1
 column.

 Seq[java.sql.Timestamp]

 //import collection.breakOut

 var seqTimestamp = scala.collection.Seq(listTs:_*)

 seqTimestamp: Seq[java.sql.Timestamp] = List(2015-07-22 16:52:00.0,
 2015-07-22 16:53:00.0, ., )

 I tried a lot of ways to create a dataframe and below is another failed
 way:

 import sqlContext.implicits._
 var rddTs = sc.parallelize(seqTimestamp)
 rddTs.toDF(minInterval)

 console:108: error: value toDF is not a member of
 org.apache.spark.rdd.RDD[java.sql.Timestamp] rddTs.toDF(minInterval)

 So, any guru could please tell me how to do this

 I am not familiar with Scala or Spark. I wonder if learning Scala will
 help this at all? It just sounds a lot of time of trial/error and
 googling.

 docs like

 https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/DataFrame.html

 https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/SQLContext.html#createDataFrame(scala.collection.Seq
 ,
 scala.reflect.api.TypeTags.TypeTag)
 does not help.

 Btw, I am using Spark 1.4.

 Thanks in advance,

 J

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




how to convert a sequence of TimeStamp to a dataframe

2015-07-31 Thread Joanne Contact
Hi Guys,

I have struggled for a while on this seeming simple thing:

I have a sequence of timestamps and want to create a dataframe with 1 column.

Seq[java.sql.Timestamp]

//import collection.breakOut

var seqTimestamp = scala.collection.Seq(listTs:_*)

seqTimestamp: Seq[java.sql.Timestamp] = List(2015-07-22 16:52:00.0,
2015-07-22 16:53:00.0, ., )

I tried a lot of ways to create a dataframe and below is another failed way:

import sqlContext.implicits._
var rddTs = sc.parallelize(seqTimestamp)
rddTs.toDF(minInterval)

console:108: error: value toDF is not a member of
org.apache.spark.rdd.RDD[java.sql.Timestamp] rddTs.toDF(minInterval)

So, any guru could please tell me how to do this

I am not familiar with Scala or Spark. I wonder if learning Scala will
help this at all? It just sounds a lot of time of trial/error and
googling.

docs like
https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/DataFrame.html
https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/SQLContext.html#createDataFrame(scala.collection.Seq,
scala.reflect.api.TypeTags.TypeTag)
does not help.

Btw, I am using Spark 1.4.

Thanks in advance,

J

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: how to convert a sequence of TimeStamp to a dataframe

2015-07-31 Thread Ted Yu
Please take a look at stringToTimestamp() in
./sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

Representing timestamp with long should work.

Cheers

On Fri, Jul 31, 2015 at 2:50 PM, Joanne Contact joannenetw...@gmail.com
wrote:

 Hi Guys,

 I have struggled for a while on this seeming simple thing:

 I have a sequence of timestamps and want to create a dataframe with 1
 column.

 Seq[java.sql.Timestamp]

 //import collection.breakOut

 var seqTimestamp = scala.collection.Seq(listTs:_*)

 seqTimestamp: Seq[java.sql.Timestamp] = List(2015-07-22 16:52:00.0,
 2015-07-22 16:53:00.0, ., )

 I tried a lot of ways to create a dataframe and below is another failed
 way:

 import sqlContext.implicits._
 var rddTs = sc.parallelize(seqTimestamp)
 rddTs.toDF(minInterval)

 console:108: error: value toDF is not a member of
 org.apache.spark.rdd.RDD[java.sql.Timestamp] rddTs.toDF(minInterval)

 So, any guru could please tell me how to do this

 I am not familiar with Scala or Spark. I wonder if learning Scala will
 help this at all? It just sounds a lot of time of trial/error and
 googling.

 docs like

 https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/DataFrame.html

 https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/SQLContext.html#createDataFrame(scala.collection.Seq
 ,
 scala.reflect.api.TypeTags.TypeTag)
 does not help.

 Btw, I am using Spark 1.4.

 Thanks in advance,

 J

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org