[ https://issues.apache.org/jira/browse/SPARK-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025853#comment-16025853 ]
Yan Facai (颜发才) commented on SPARK-20787: ----------------------------------------- It seems that the exception is raised when serialization. Microsecond is used as internal representation of Timestamp in pyspark. As scala works well, hence there must be something wrong in python code. <code> scala> val df = Seq("05/26/1800 01:01:01").toDF("dt") scala> val ts = unix_timestamp($"dt","MM/dd/yyyy HH:mm:ss").cast("timestamp") scala> df.withColumn("ts", ts).show(false) +-------------------+---------------------+ |dt |ts | +-------------------+---------------------+ |05/26/1800 01:01:01|1800-05-26 01:01:01.0| +-------------------+---------------------+ </code> > PySpark can't handle datetimes before 1900 > ------------------------------------------ > > Key: SPARK-20787 > URL: https://issues.apache.org/jira/browse/SPARK-20787 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.1.0, 2.1.1 > Reporter: Keith Bourgoin > > When trying to put a datetime before 1900 into a DataFrame, it throws an > error because of the use of time.mktime. > {code} > Python 2.7.13 (default, Mar 8 2017, 17:29:55) > Type "copyright", "credits" or "license" for more information. > IPython 5.3.0 -- An enhanced Interactive Python. > ? -> Introduction and overview of IPython's features. > %quickref -> Quick reference. > help -> Python's own help system. > object? -> Details about 'object', use 'object??' for extra details. > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 17/05/17 12:45:59 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 17/05/17 12:46:02 WARN ObjectStore: Failed to get database global_temp, > returning NoSuchObjectException > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 2.1.0 > /_/ > Using Python version 2.7.13 (default, Mar 8 2017 17:29:55) > SparkSession available as 'spark'. > In [1]: import datetime as dt > In [2]: > sqlContext.createDataFrame(sc.parallelize([[dt.datetime(1899,12,31)]])).count() > 17/05/17 12:46:16 ERROR Executor: Exception in task 3.0 in stage 2.0 (TID 7) > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 174, in main > process() > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 169, in process > serializer.dump_stream(func(split_index, iterator), outfile) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 268, in dump_stream > vs = list(itertools.islice(iterator, batch)) > File "/home/kfb/src/projects/spark/python/pyspark/sql/types.py", line 576, > in toInternal > return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) > File "/home/kfb/src/projects/spark/python/pyspark/sql/types.py", line 576, > in <genexpr> > return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 436, in toInternal > return self.dataType.toInternal(obj) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 191, in toInternal > else time.mktime(dt.timetuple())) > ValueError: year out of range > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) > at > org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) > at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/05/17 12:46:16 WARN TaskSetManager: Lost task 3.0 in stage 2.0 (TID 7, > localhost, executor driver): org.apache.spark.api.python.PythonException: > Traceback (most recent call last): > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 174, in main > process() > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 169, in process > serializer.dump_stream(func(split_index, iterator), outfile) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 268, in dump_stream > vs = list(itertools.islice(iterator, batch)) > File "/home/kfb/src/projects/spark/python/pyspark/sql/types.py", line 576, > in toInternal > return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) > File "/home/kfb/src/projects/spark/python/pyspark/sql/types.py", line 576, > in <genexpr> > return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 436, in toInternal > return self.dataType.toInternal(obj) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 191, in toInternal > else time.mktime(dt.timetuple())) > ValueError: year out of range > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) > at > org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) > at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/05/17 12:46:16 ERROR TaskSetManager: Task 3 in stage 2.0 failed 1 times; > aborting job > 17/05/17 12:46:16 WARN TaskSetManager: Lost task 1.0 in stage 2.0 (TID 5, > localhost, executor driver): TaskKilled (killed intentionally) > 17/05/17 12:46:16 WARN TaskSetManager: Lost task 2.0 in stage 2.0 (TID 6, > localhost, executor driver): TaskKilled (killed intentionally) > 17/05/17 12:46:16 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 4, > localhost, executor driver): TaskKilled (killed intentionally) > --------------------------------------------------------------------------- > Py4JJavaError Traceback (most recent call last) > <ipython-input-2-7e1f7293354f> in <module>() > ----> 1 > sqlContext.createDataFrame(sc.parallelize([[dt.datetime(1899,12,31)]])).count() > /home/kfb/src/projects/spark/python/pyspark/sql/dataframe.pyc in count(self) > 378 2 > 379 """ > --> 380 return int(self._jdf.count()) > 381 > 382 @ignore_unicode_prefix > /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py > in __call__(self, *args) > 1131 answer = self.gateway_client.send_command(command) > 1132 return_value = get_return_value( > -> 1133 answer, self.gateway_client, self.target_id, self.name) > 1134 > 1135 for temp_arg in temp_args: > /home/kfb/src/projects/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw) > 61 def deco(*a, **kw): > 62 try: > ---> 63 return f(*a, **kw) > 64 except py4j.protocol.Py4JJavaError as e: > 65 s = e.java_exception.toString() > /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 317 raise Py4JJavaError( > 318 "An error occurred while calling {0}{1}{2}.\n". > --> 319 format(target_id, ".", name), value) > 320 else: > 321 raise Py4JError( > Py4JJavaError: An error occurred while calling o58.count. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 > in stage 2.0 failed 1 times, most recent failure: Lost task 3.0 in stage 2.0 > (TID 7, localhost, executor driver): > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 174, in main > process() > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 169, in process > serializer.dump_stream(func(split_index, iterator), outfile) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 268, in dump_stream > vs = list(itertools.islice(iterator, batch)) > File "/home/kfb/src/projects/spark/python/pyspark/sql/types.py", line 576, > in toInternal > return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) > File "/home/kfb/src/projects/spark/python/pyspark/sql/types.py", line 576, > in <genexpr> > return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 436, in toInternal > return self.dataType.toInternal(obj) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 191, in toInternal > else time.mktime(dt.timetuple())) > ValueError: year out of range > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) > at > org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) > at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.RDD.collect(RDD.scala:934) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377) > at > org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405) > at > org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404) > at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778) > at org.apache.spark.sql.Dataset.count(Dataset.scala:2404) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.api.python.PythonException: Traceback (most > recent call last): > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 174, in main > process() > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 169, in process > serializer.dump_stream(func(split_index, iterator), outfile) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 268, in dump_stream > vs = list(itertools.islice(iterator, batch)) > File "/home/kfb/src/projects/spark/python/pyspark/sql/types.py", line 576, > in toInternal > return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) > File "/home/kfb/src/projects/spark/python/pyspark/sql/types.py", line 576, > in <genexpr> > return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 436, in toInternal > return self.dataType.toInternal(obj) > File > "/home/kfb/src/projects/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 191, in toInternal > else time.mktime(dt.timetuple())) > ValueError: year out of range > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) > at > org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234) > at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ... 1 more > In [3]: > sqlContext.createDataFrame(sc.parallelize([[dt.datetime(1900,1,1)]])).count() > Out[3]: 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org