[ https://issues.apache.org/jira/browse/SPARK-22010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166136#comment-16166136 ]
Maciej Bryński edited comment on SPARK-22010 at 9/14/17 12:05 PM: ------------------------------------------------------------------ The reason of this Jira is this profiling (attachment). !profile_fact_dok.jpg|thumbnail! As you can see about 80% of pyspark time is spent in Spark internals. was (Author: maver1ck): The reason of this Jira is this profiling (attachment). As you can see about 80% of pyspark time is spent in Spark internals. > Slow fromInternal conversion for TimestampType > ---------------------------------------------- > > Key: SPARK-22010 > URL: https://issues.apache.org/jira/browse/SPARK-22010 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.2.0 > Reporter: Maciej Bryński > Attachments: profile_fact_dok.png > > > To convert timestamp type to python we are using > `datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % > 1000000)` > code. > {code} > In [34]: %%timeit > ...: > datetime.datetime.fromtimestamp(1505383647).replace(microsecond=12344) > ...: > 4.2 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) > {code} > It's slow, because: > # we're trying to get TZ on every conversion > # we're using replace method > Proposed solution: custom datetime conversion and move calculation of TZ to > module -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org