Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread VISHNU SUBRAMANIAN
HI ,

If you need a data frame specific solution , you can try the below

df.select(from_unixtime(col("max(utcTimestamp)")/1000))

On Tue, 2 Feb 2016 at 09:44 Ted Yu  wrote:

> See related thread on using Joda DateTime:
> http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+
> when+using+Joda+DateTime
>
> On Mon, Feb 1, 2016 at 7:44 PM, Kevin Mellott 
> wrote:
>
>> I've had pretty good success using Joda-Time
>>  for date/time manipulations
>> within Spark applications. You may be able to use the *DateTIme* constructor
>> below, if you are starting with milliseconds.
>>
>> DateTime
>>
>> public DateTime(long instant)
>>
>> Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z
>> using ISOChronology in the default time zone.
>> Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z
>>
>> On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson <
>> a...@santacruzintegration.com> wrote:
>>
>>> What little I know about working with timestamps is based on
>>> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
>>>
>>> Using the example of dates formatted into human friend strings ->
>>> timeStamps I was able to figure out how to convert Epoch times to
>>> timestamps. The same trick did not work for millisecond times.
>>>
>>> Any suggestions would be greatly appreciated.
>>>
>>>
>>> Andy
>>>
>>> Working with epoch times
>>> 
>>>
>>> ref: http://www.epochconverter.com/
>>>
>>> Epoch timestamp: 1456050620
>>>
>>> Timestamp in milliseconds: 145605062
>>>
>>> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
>>>
>>> Human time (your time zone): 2/21/2016, 2:30:20 AM
>>>
>>>
>>> # Epoch time stamp example
>>>
>>> data = [
>>>
>>>   ("1456050620", "1456050621", 1),
>>>
>>>   ("1456050622", "14560506203", 2),
>>>
>>>   ("14560506204", "14560506205", 3)]
>>>
>>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>>
>>> ​
>>>
>>> # convert epoch time strings in to spark timestamps
>>>
>>> df = df.select(
>>>
>>>   df.start_time.cast("long").alias("start_time"),
>>>
>>>   df.end_time.cast("long").alias("end_time"),
>>>
>>>   df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> ​
>>>
>>> # convert longs to timestamps
>>>
>>> df = df.select(
>>>
>>>   df.start_time.cast("timestamp").alias("start_time"),
>>>
>>>   df.end_time.cast("timestamp").alias("end_time"),
>>>
>>>   df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> ​
>>>
>>> root
>>>  |-- start_time: long (nullable = true)
>>>  |-- end_time: long (nullable = true)
>>>  |-- id: long (nullable = true)
>>>
>>> +---+---+---+
>>> |start_time |end_time   |id |
>>> +---+---+---+
>>> |1456050620 |1456050621 |1  |
>>> |1456050622 |14560506203|2  |
>>> |14560506204|14560506205|3  |
>>> +---+---+---+
>>>
>>> root
>>>  |-- start_time: timestamp (nullable = true)
>>>  |-- end_time: timestamp (nullable = true)
>>>  |-- id: long (nullable = true)
>>>
>>> +-+-+---+
>>> |start_time   |end_time |id |
>>> +-+-+---+
>>> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
>>> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
>>> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
>>> +-+-+---+
>>>
>>>
>>> In [21]:
>>>
>>> # working with millisecond times
>>>
>>> data = [
>>>
>>>   ("145605062", "145605062", 1)]
>>>
>>>   df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>>
>>> ​
>>>
>>> # convert epoch time strings in to spark timestamps
>>>
>>> df = df.select(
>>>
>>>   df.start_time.cast("long").alias("start_time"),
>>>
>>>   df.end_time.cast("long").alias("end_time"),
>>>
>>>   df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> ​
>>>
>>> # convert longs to timestamps
>>>
>>> df = df.select(
>>>
>>>   df.start_time.cast("timestamp").alias("start_time"),
>>>
>>>   df.end_time.cast("timestamp").alias("end_time"),
>>>
>>>   df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> root
>>>  |-- start_time: long (nullable = true)
>>>  |-- end_time: long (nullable = true)
>>>  |-- id: long (nullable = true)
>>>
>>> +-+-+---+
>>> |start_time   |end_time |id |
>>> +-+-+---+
>>> |145605062|145605062|1  |
>>> +-+-+---+
>>>
>>> root
>>>  |-- start_time: timestamp (nullable = true)
>>>  |-- end_time: timestamp (nullable = true)
>>>  |-- id: long (nullable = true)
>>>
>>> +--+--+---+
>>> |start_time|end_time  |id |
>>> +--+--+---+
>>> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.

Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread Ted Yu
See related thread on using Joda DateTime:
http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+
when+using+Joda+DateTime

On Mon, Feb 1, 2016 at 7:44 PM, Kevin Mellott 
wrote:

> I've had pretty good success using Joda-Time
>  for date/time manipulations
> within Spark applications. You may be able to use the *DateTIme* constructor
> below, if you are starting with milliseconds.
>
> DateTime
>
> public DateTime(long instant)
>
> Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z
> using ISOChronology in the default time zone.
> Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z
>
> On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>> What little I know about working with timestamps is based on
>> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
>>
>> Using the example of dates formatted into human friend strings ->
>> timeStamps I was able to figure out how to convert Epoch times to
>> timestamps. The same trick did not work for millisecond times.
>>
>> Any suggestions would be greatly appreciated.
>>
>>
>> Andy
>>
>> Working with epoch times
>> 
>>
>> ref: http://www.epochconverter.com/
>>
>> Epoch timestamp: 1456050620
>>
>> Timestamp in milliseconds: 145605062
>>
>> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
>>
>> Human time (your time zone): 2/21/2016, 2:30:20 AM
>>
>>
>> # Epoch time stamp example
>>
>> data = [
>>
>>   ("1456050620", "1456050621", 1),
>>
>>   ("1456050622", "14560506203", 2),
>>
>>   ("14560506204", "14560506205", 3)]
>>
>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>
>> ​
>>
>> # convert epoch time strings in to spark timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("long").alias("start_time"),
>>
>>   df.end_time.cast("long").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> ​
>>
>> # convert longs to timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("timestamp").alias("start_time"),
>>
>>   df.end_time.cast("timestamp").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> ​
>>
>> root
>>  |-- start_time: long (nullable = true)
>>  |-- end_time: long (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +---+---+---+
>> |start_time |end_time   |id |
>> +---+---+---+
>> |1456050620 |1456050621 |1  |
>> |1456050622 |14560506203|2  |
>> |14560506204|14560506205|3  |
>> +---+---+---+
>>
>> root
>>  |-- start_time: timestamp (nullable = true)
>>  |-- end_time: timestamp (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +-+-+---+
>> |start_time   |end_time |id |
>> +-+-+---+
>> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
>> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
>> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
>> +-+-+---+
>>
>>
>> In [21]:
>>
>> # working with millisecond times
>>
>> data = [
>>
>>   ("145605062", "145605062", 1)]
>>
>>   df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>
>> ​
>>
>> # convert epoch time strings in to spark timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("long").alias("start_time"),
>>
>>   df.end_time.cast("long").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> ​
>>
>> # convert longs to timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("timestamp").alias("start_time"),
>>
>>   df.end_time.cast("timestamp").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> root
>>  |-- start_time: long (nullable = true)
>>  |-- end_time: long (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +-+-+---+
>> |start_time   |end_time |id |
>> +-+-+---+
>> |145605062|145605062|1  |
>> +-+-+---+
>>
>> root
>>  |-- start_time: timestamp (nullable = true)
>>  |-- end_time: timestamp (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +--+--+---+
>> |start_time|end_time  |id |
>> +--+--+---+
>> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
>> +--+--+---+
>>
>>
>>
>


Re: how to covert millisecond time to SQL timeStamp

2016-02-01 Thread Kevin Mellott
I've had pretty good success using Joda-Time
 for date/time manipulations
within Spark applications. You may be able to use the *DateTIme* constructor
below, if you are starting with milliseconds.

DateTime

public DateTime(long instant)

Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z
using ISOChronology in the default time zone.
Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z

On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson  wrote:

> What little I know about working with timestamps is based on
> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
>
> Using the example of dates formatted into human friend strings ->
> timeStamps I was able to figure out how to convert Epoch times to
> timestamps. The same trick did not work for millisecond times.
>
> Any suggestions would be greatly appreciated.
>
>
> Andy
>
> Working with epoch times
> 
>
> ref: http://www.epochconverter.com/
>
> Epoch timestamp: 1456050620
>
> Timestamp in milliseconds: 145605062
>
> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
>
> Human time (your time zone): 2/21/2016, 2:30:20 AM
>
>
> # Epoch time stamp example
>
> data = [
>
>   ("1456050620", "1456050621", 1),
>
>   ("1456050622", "14560506203", 2),
>
>   ("14560506204", "14560506205", 3)]
>
> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>
> ​
>
> # convert epoch time strings in to spark timestamps
>
> df = df.select(
>
>   df.start_time.cast("long").alias("start_time"),
>
>   df.end_time.cast("long").alias("end_time"),
>
>   df.id)
>
> df.printSchema()
>
> df.show(truncate=False)
>
> ​
>
> # convert longs to timestamps
>
> df = df.select(
>
>   df.start_time.cast("timestamp").alias("start_time"),
>
>   df.end_time.cast("timestamp").alias("end_time"),
>
>   df.id)
>
> df.printSchema()
>
> df.show(truncate=False)
>
> ​
>
> root
>  |-- start_time: long (nullable = true)
>  |-- end_time: long (nullable = true)
>  |-- id: long (nullable = true)
>
> +---+---+---+
> |start_time |end_time   |id |
> +---+---+---+
> |1456050620 |1456050621 |1  |
> |1456050622 |14560506203|2  |
> |14560506204|14560506205|3  |
> +---+---+---+
>
> root
>  |-- start_time: timestamp (nullable = true)
>  |-- end_time: timestamp (nullable = true)
>  |-- id: long (nullable = true)
>
> +-+-+---+
> |start_time   |end_time |id |
> +-+-+---+
> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
> +-+-+---+
>
>
> In [21]:
>
> # working with millisecond times
>
> data = [
>
>   ("145605062", "145605062", 1)]
>
>   df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>
> ​
>
> # convert epoch time strings in to spark timestamps
>
> df = df.select(
>
>   df.start_time.cast("long").alias("start_time"),
>
>   df.end_time.cast("long").alias("end_time"),
>
>   df.id)
>
> df.printSchema()
>
> df.show(truncate=False)
>
> ​
>
> # convert longs to timestamps
>
> df = df.select(
>
>   df.start_time.cast("timestamp").alias("start_time"),
>
>   df.end_time.cast("timestamp").alias("end_time"),
>
>   df.id)
>
> df.printSchema()
>
> df.show(truncate=False)
>
> root
>  |-- start_time: long (nullable = true)
>  |-- end_time: long (nullable = true)
>  |-- id: long (nullable = true)
>
> +-+-+---+
> |start_time   |end_time |id |
> +-+-+---+
> |145605062|145605062|1  |
> +-+-+---+
>
> root
>  |-- start_time: timestamp (nullable = true)
>  |-- end_time: timestamp (nullable = true)
>  |-- id: long (nullable = true)
>
> +--+--+---+
> |start_time|end_time  |id |
> +--+--+---+
> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
> +--+--+---+
>
>
>


how to covert millisecond time to SQL timeStamp

2016-02-01 Thread Andy Davidson
What little I know about working with timestamps is based on
https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-da
tetimestring-handling-time-intervals-and-udafs.html

Using the example of dates formatted into human friend strings -> timeStamps
I was able to figure out how to convert Epoch times to timestamps. The same
trick did not work for millisecond times.

Any suggestions would be greatly appreciated.


Andy

Working with epoch times
 
ref: http://www.epochconverter.com/
Epoch timestamp:  1456050620
Timestamp in milliseconds: 145605062
Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
Human time (your time zone): 2/21/2016, 2:30:20 AM

# Epoch time stamp example
data = [
  ("1456050620", "1456050621", 1),
  ("1456050622", "14560506203", 2),
  ("14560506204", "14560506205", 3)]
df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
​
# convert epoch time strings in to spark timestamps
df = df.select(
  df.start_time.cast("long").alias("start_time"),
  df.end_time.cast("long").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
​
# convert longs to timestamps
df = df.select(
  df.start_time.cast("timestamp").alias("start_time"),
  df.end_time.cast("timestamp").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
​
root
 |-- start_time: long (nullable = true)
 |-- end_time: long (nullable = true)
 |-- id: long (nullable = true)

+---+---+---+
|start_time |end_time   |id |
+---+---+---+
|1456050620 |1456050621 |1  |
|1456050622 |14560506203|2  |
|14560506204|14560506205|3  |
+---+---+---+

root
 |-- start_time: timestamp (nullable = true)
 |-- end_time: timestamp (nullable = true)
 |-- id: long (nullable = true)

+-+-+---+
|start_time   |end_time |id |
+-+-+---+
|2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
|2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
|2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
+-+-+---+

In [21]:
# working with millisecond times
data = [
  ("145605062", "145605062", 1)]
  
df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
​
# convert epoch time strings in to spark timestamps
df = df.select(
  df.start_time.cast("long").alias("start_time"),
  df.end_time.cast("long").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
​
# convert longs to timestamps
df = df.select(
  df.start_time.cast("timestamp").alias("start_time"),
  df.end_time.cast("timestamp").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
root
 |-- start_time: long (nullable = true)
 |-- end_time: long (nullable = true)
 |-- id: long (nullable = true)

+-+-+---+
|start_time   |end_time |id |
+-+-+---+
|145605062|145605062|1  |
+-+-+---+

root
 |-- start_time: timestamp (nullable = true)
 |-- end_time: timestamp (nullable = true)
 |-- id: long (nullable = true)

+--+--+---+
|start_time|end_time  |id |
+--+--+---+
|48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
+--+--+---+