Tanveer created SPARK-32116:
-------------------------------

             Summary: Python RDD containing a 'pyarrow record_batch object' to 
java RDD conversion issue
                 Key: SPARK-32116
                 URL: https://issues.apache.org/jira/browse/SPARK-32116
             Project: Spark
          Issue Type: Question
          Components: PySpark
    Affects Versions: 2.3.4
            Reporter: Tanveer


I want to convert a Python 'prdd' containing a 'pyarrow record_batch object' to 
'jrdd' to further use that jrdd for Spark Dataframe conversion. But I am facing 
some issue in deleaing with it. Please see the attached log. I am new to Spark 
and I'm struggling with this issue from many days. All the codes in pyspark 
with Arrow repo is about Pnadas to Arrow. But I want to to make an 'ardd' from 
Arrow recordbatches and then convert it into Spark Dataframe.

Is my approach is right?

No one is answering in mailing list. Please, someone, guide me on this issue. 
Thanks. 

 
{code:java}
data = [pa.array(range(5), type='int16'),pa.array([-10, -5, 0, None, 10], 
type='int32')]

batch = pa.record_batch(data, ['c0', 'c1'])

data_rdd = spark.sparkContext.parallelize(batch)
data_java_rdd = data_rdd._to_java_object_rdd()
data_python_rdd = spark.sparkContext._jvm.SerDeUtil.javaToPython(data_java_rdd)
converted_rdd = RDD(data_python_rdd, spark.sparkContext)
print(converted_rdd.count())

{code}
 

 

 

Log:

 
{code:java}
2020-06-28 07:09:54 WARN NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2020-06-28 07:09:55 INFO SparkContext:54 - Running Spark version 2.3.4
2020-06-28 07:09:55 INFO SparkContext:54 - Submitted application: Python 
Arrow-in-Spark example
2020-06-28 07:09:55 INFO SecurityManager:54 - Changing view acls to: tahmad
2020-06-28 07:09:55 INFO SecurityManager:54 - Changing modify acls to: tahmad
2020-06-28 07:09:55 INFO SecurityManager:54 - Changing view acls groups to: 
2020-06-28 07:09:55 INFO SecurityManager:54 - Changing modify acls groups to: 
2020-06-28 07:09:55 INFO SecurityManager:54 - SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(tahmad); groups 
with view permissions: Set(); users with modify permissions: Set(tahmad); 
groups with modify permissions: Set()
2020-06-28 07:09:55 INFO Utils:54 - Successfully started service 'sparkDriver' 
on port 33475.
2020-06-28 07:09:55 INFO SparkEnv:54 - Registering MapOutputTracker
2020-06-28 07:09:55 INFO SparkEnv:54 - Registering BlockManagerMaster
2020-06-28 07:09:55 INFO BlockManagerMasterEndpoint:54 - Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2020-06-28 07:09:55 INFO BlockManagerMasterEndpoint:54 - 
BlockManagerMasterEndpoint up
2020-06-28 07:09:55 INFO DiskBlockManager:54 - Created local directory at 
/tmp/blockmgr-e7d2bdc7-ae0f-4186-b1c1-bcde7bbdccfa
2020-06-28 07:09:55 INFO MemoryStore:54 - MemoryStore started with capacity 
366.3 MB
2020-06-28 07:09:55 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2020-06-28 07:09:55 INFO log:192 - Logging initialized @2270ms
2020-06-28 07:09:55 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: 
unknown, git hash: unknown
2020-06-28 07:09:55 INFO Server:419 - Started @2337ms
2020-06-28 07:09:55 INFO AbstractConnector:278 - Started 
ServerConnector@27de2e9{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-06-28 07:09:55 INFO Utils:54 - Successfully started service 'SparkUI' on 
port 4040.
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@4a902f7b{/jobs,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@58cbdcbe{/jobs/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@4b0b49e8{/jobs/job,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@61bc464a{/jobs/job/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@671448ef{/stages,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@764119ec{/stages/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@7a535077{/stages/stage,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@42385b89{/stages/stage/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@22086b51{/stages/pool,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@2d62d5be{/stages/pool/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@540f4794{/storage,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@546bba8f{/storage/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@35d811b0{/storage/rdd,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@eacfd90{/storage/rdd/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@20b13836{/environment,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@2987516d{/environment/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@5c27907c{/executors,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@358eb615{/executors/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@1cd7481a{/executors/threadDump,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@fc87ed4{/executors/threadDump/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@6f8cab21{/static,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@7d4c5b58{/,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@54cdd155{/api,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@1c5a66d2{/jobs/job/kill,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@9218486{/stages/stage/kill,null,AVAILABLE,@Spark}
2020-06-28 07:09:55 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at 
http://tcn862:4040
2020-06-28 07:09:55 INFO SparkContext:54 - Added file 
file:/nfs/home3/tahmad/tahmad/script.py at 
file:/nfs/home3/tahmad/tahmad/script.py with timestamp 1593320995973
2020-06-28 07:09:55 INFO Utils:54 - Copying /nfs/home3/tahmad/tahmad/script.py 
to 
/tmp/spark-405f8ca1-a57f-4cae-8fa4-d459dd74b5d7/userFiles-d8452ced-0f8d-47b0-9d31-95fd736628a4/script.py
2020-06-28 07:09:56 INFO Executor:54 - Starting executor ID driver on host 
localhost
2020-06-28 07:09:56 INFO Utils:54 - Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 32978.
2020-06-28 07:09:56 INFO NettyBlockTransferService:54 - Server created on 
tcn862:32978
2020-06-28 07:09:56 INFO BlockManager:54 - Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
2020-06-28 07:09:56 INFO BlockManagerMaster:54 - Registering BlockManager 
BlockManagerId(driver, tcn862, 32978, None)
2020-06-28 07:09:56 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager tcn862:32978 with 366.3 MB RAM, BlockManagerId(driver, tcn862, 32978, 
None)
2020-06-28 07:09:56 INFO BlockManagerMaster:54 - Registered BlockManager 
BlockManagerId(driver, tcn862, 32978, None)
2020-06-28 07:09:56 INFO BlockManager:54 - Initialized BlockManager: 
BlockManagerId(driver, tcn862, 32978, None)
2020-06-28 07:09:56 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@56ce8be9{/metrics/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:56 INFO SharedState:54 - Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/nfs/home3/tahmad/tahmad/spark-warehouse/').
2020-06-28 07:09:56 INFO SharedState:54 - Warehouse path is 
'file:/nfs/home3/tahmad/tahmad/spark-warehouse/'.
2020-06-28 07:09:56 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@36e1dd67{/SQL,null,AVAILABLE,@Spark}
2020-06-28 07:09:56 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@77e876c7{/SQL/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:56 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@2fb8cdb0{/SQL/execution,null,AVAILABLE,@Spark}
2020-06-28 07:09:56 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@651413b3{/SQL/execution/json,null,AVAILABLE,@Spark}
2020-06-28 07:09:56 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@7db9e9cb{/static/sql,null,AVAILABLE,@Spark}
2020-06-28 07:09:56 INFO StateStoreCoordinatorRef:54 - Registered 
StateStoreCoordinator endpoint
2020-06-28 07:09:56 INFO SparkContext:54 - Starting job: count at 
/nfs/home3/tahmad/tahmad/script.py:301
2020-06-28 07:09:56 INFO DAGScheduler:54 - Got job 0 (count at 
/nfs/home3/tahmad/tahmad/script.py:301) with 24 output partitions
2020-06-28 07:09:56 INFO DAGScheduler:54 - Final stage: ResultStage 0 (count at 
/nfs/home3/tahmad/tahmad/script.py:301)
2020-06-28 07:09:56 INFO DAGScheduler:54 - Parents of final stage: List()
2020-06-28 07:09:56 INFO DAGScheduler:54 - Missing parents: List()
2020-06-28 07:09:56 INFO DAGScheduler:54 - Submitting ResultStage 0 
(PythonRDD[4] at count at /nfs/home3/tahmad/tahmad/script.py:301), which has no 
missing parents
2020-06-28 07:09:56 INFO MemoryStore:54 - Block broadcast_0 stored as values in 
memory (estimated size 7.1 KB, free 366.3 MB)
2020-06-28 07:09:57 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 4.2 KB, free 366.3 MB)
2020-06-28 07:09:57 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in 
memory on tcn862:32978 (size: 4.2 KB, free: 366.3 MB)
2020-06-28 07:09:57 INFO SparkContext:54 - Created broadcast 0 from broadcast 
at DAGScheduler.scala:1039
2020-06-28 07:09:57 INFO DAGScheduler:54 - Submitting 24 missing tasks from 
ResultStage 0 (PythonRDD[4] at count at /nfs/home3/tahmad/tahmad/script.py:301) 
(first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14))
2020-06-28 07:09:57 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 24 
tasks
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 
(TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 
(TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 
(TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 3.0 in stage 0.0 
(TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 4.0 in stage 0.0 
(TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 5.0 in stage 0.0 
(TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 6.0 in stage 0.0 
(TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 7.0 in stage 0.0 
(TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 8.0 in stage 0.0 
(TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 9.0 in stage 0.0 
(TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 10.0 in stage 0.0 
(TID 10, localhost, executor driver, partition 10, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 11.0 in stage 0.0 
(TID 11, localhost, executor driver, partition 11, PROCESS_LOCAL, 7999 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 12.0 in stage 0.0 
(TID 12, localhost, executor driver, partition 12, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 13.0 in stage 0.0 
(TID 13, localhost, executor driver, partition 13, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 14.0 in stage 0.0 
(TID 14, localhost, executor driver, partition 14, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 15.0 in stage 0.0 
(TID 15, localhost, executor driver, partition 15, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 16.0 in stage 0.0 
(TID 16, localhost, executor driver, partition 16, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 17.0 in stage 0.0 
(TID 17, localhost, executor driver, partition 17, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 18.0 in stage 0.0 
(TID 18, localhost, executor driver, partition 18, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 19.0 in stage 0.0 
(TID 19, localhost, executor driver, partition 19, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 20.0 in stage 0.0 
(TID 20, localhost, executor driver, partition 20, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 21.0 in stage 0.0 
(TID 21, localhost, executor driver, partition 21, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 22.0 in stage 0.0 
(TID 22, localhost, executor driver, partition 22, PROCESS_LOCAL, 7839 bytes)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Starting task 23.0 in stage 0.0 
(TID 23, localhost, executor driver, partition 23, PROCESS_LOCAL, 8009 bytes)
2020-06-28 07:09:57 INFO Executor:54 - Running task 7.0 in stage 0.0 (TID 7)
2020-06-28 07:09:57 INFO Executor:54 - Running task 5.0 in stage 0.0 (TID 5)
2020-06-28 07:09:57 INFO Executor:54 - Running task 12.0 in stage 0.0 (TID 12)
2020-06-28 07:09:57 INFO Executor:54 - Running task 14.0 in stage 0.0 (TID 14)
2020-06-28 07:09:57 INFO Executor:54 - Running task 6.0 in stage 0.0 (TID 6)
2020-06-28 07:09:57 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2020-06-28 07:09:57 INFO Executor:54 - Running task 9.0 in stage 0.0 (TID 9)
2020-06-28 07:09:57 INFO Executor:54 - Running task 8.0 in stage 0.0 (TID 8)
2020-06-28 07:09:57 INFO Executor:54 - Running task 10.0 in stage 0.0 (TID 10)
2020-06-28 07:09:57 INFO Executor:54 - Running task 4.0 in stage 0.0 (TID 4)
2020-06-28 07:09:57 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2020-06-28 07:09:57 INFO Executor:54 - Running task 3.0 in stage 0.0 (TID 3)
2020-06-28 07:09:57 INFO Executor:54 - Running task 11.0 in stage 0.0 (TID 11)
2020-06-28 07:09:57 INFO Executor:54 - Running task 2.0 in stage 0.0 (TID 2)
2020-06-28 07:09:57 INFO Executor:54 - Running task 13.0 in stage 0.0 (TID 13)
2020-06-28 07:09:57 INFO Executor:54 - Running task 23.0 in stage 0.0 (TID 23)
2020-06-28 07:09:57 INFO Executor:54 - Running task 22.0 in stage 0.0 (TID 22)
2020-06-28 07:09:57 INFO Executor:54 - Running task 21.0 in stage 0.0 (TID 21)
2020-06-28 07:09:57 INFO Executor:54 - Running task 20.0 in stage 0.0 (TID 20)
2020-06-28 07:09:57 INFO Executor:54 - Running task 19.0 in stage 0.0 (TID 19)
2020-06-28 07:09:57 INFO Executor:54 - Running task 18.0 in stage 0.0 (TID 18)
2020-06-28 07:09:57 INFO Executor:54 - Running task 17.0 in stage 0.0 (TID 17)
2020-06-28 07:09:57 INFO Executor:54 - Running task 16.0 in stage 0.0 (TID 16)
2020-06-28 07:09:57 INFO Executor:54 - Running task 15.0 in stage 0.0 (TID 15)
2020-06-28 07:09:57 INFO Executor:54 - Fetching 
file:/nfs/home3/tahmad/tahmad/script.py with timestamp 1593320995973
2020-06-28 07:09:57 INFO Utils:54 - /nfs/home3/tahmad/tahmad/script.py has been 
previously copied to 
/tmp/spark-405f8ca1-a57f-4cae-8fa4-d459dd74b5d7/userFiles-d8452ced-0f8d-47b0-9d31-95fd736628a4/script.py
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 334, boot = 293, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 332, boot = 292, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 330, boot = 290, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 340, boot = 299, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 341, boot = 301, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 329, boot = 289, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 328, boot = 287, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 344, boot = 303, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 345, boot = 305, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 326, boot = 285, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 347, boot = 307, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 324, boot = 284, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 349, boot = 308, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 322, boot = 282, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 350, boot = 310, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 319, boot = 279, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 352, boot = 312, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 354, boot = 314, init 
= 40, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 320, boot = 274, init 
= 46, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 319, boot = 276, init 
= 43, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 320, boot = 278, init 
= 42, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 48, boot = 6, init = 
42, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 46, boot = 3, init = 
42, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 51, boot = 10, init = 
40, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 54, boot = 13, init = 
40, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 56, boot = 15, init = 
40, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 356, boot = 315, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 61, boot = 20, init = 
40, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 65, boot = 24, init = 
41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 69, boot = 28, init = 
41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 72, boot = 30, init = 
41, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 83, boot = 42, init = 
41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 87, boot = 46, init = 
40, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 90, boot = 49, init = 
40, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 94, boot = 53, init = 
40, finish = 1
2020-06-28 07:09:57 INFO Executor:54 - Finished task 5.0 in stage 0.0 (TID 5). 
1418 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 17.0 in stage 0.0 (TID 
17). 1418 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 13.0 in stage 0.0 (TID 
13). 1418 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 18.0 in stage 0.0 (TID 
18). 1418 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 14.0 in stage 0.0 (TID 
14). 1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 12.0 in stage 0.0 (TID 
12). 1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 
1418 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 7.0 in stage 0.0 (TID 7). 
1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 6.0 in stage 0.0 (TID 6). 
1418 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 8.0 in stage 0.0 (TID 8). 
1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 15.0 in stage 0.0 (TID 
15). 1418 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 19.0 in stage 0.0 (TID 
19). 1418 bytes result sent to driver
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 105, boot = 63, init 
= 42, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 108, boot = 67, init 
= 40, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 98, boot = 57, init = 
40, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 101, boot = 59, init 
= 42, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 110, boot = 69, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO Executor:54 - Finished task 22.0 in stage 0.0 (TID 
22). 1418 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 
1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 9.0 in stage 0.0 (TID 9). 
1461 bytes result sent to driver
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 114, boot = 73, init 
= 41, finish = 0
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 113, boot = 71, init 
= 41, finish = 1
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 116, boot = 75, init 
= 40, finish = 1
2020-06-28 07:09:57 INFO Executor:54 - Finished task 3.0 in stage 0.0 (TID 3). 
1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 4.0 in stage 0.0 (TID 4). 
1461 bytes result sent to driver
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 5.0 in stage 0.0 
(TID 5) in 493 ms on localhost (executor driver) (1/24)
2020-06-28 07:09:57 INFO Executor:54 - Finished task 20.0 in stage 0.0 (TID 
20). 1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 16.0 in stage 0.0 (TID 
16). 1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 21.0 in stage 0.0 (TID 
21). 1461 bytes result sent to driver
2020-06-28 07:09:57 INFO Executor:54 - Finished task 2.0 in stage 0.0 (TID 2). 
1461 bytes result sent to driver
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 13.0 in stage 0.0 
(TID 13) in 494 ms on localhost (executor driver) (2/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 17.0 in stage 0.0 
(TID 17) in 493 ms on localhost (executor driver) (3/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 18.0 in stage 0.0 
(TID 18) in 494 ms on localhost (executor driver) (4/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 14.0 in stage 0.0 
(TID 14) in 496 ms on localhost (executor driver) (5/24)
2020-06-28 07:09:57 ERROR PythonRunner:91 - Python worker exited unexpectedly 
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py",
 line 238, in main
 eval_type = read_int(infile)
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py",
 line 692, in read_int
 raise EOFError
EOFError
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:336)
 at 
org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:475)
 at 
org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:458)
 at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:290)
 at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
 at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
 at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
 at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
 at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
 at 
org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
 at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
 at 
org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
 at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:109)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: net.razorvine.pickle.PickleException: expected zero arguments for 
construction of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
2020-06-28 07:09:57 INFO PythonRunner:54 - Times: total = 89, boot = 15, init = 
74, finish = 0
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 12.0 in stage 0.0 
(TID 12) in 496 ms on localhost (executor driver) (6/24)
2020-06-28 07:09:57 ERROR PythonRunner:91 - Python worker exited unexpectedly 
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py",
 line 238, in main
 eval_type = read_int(infile)
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py",
 line 692, in read_int
 raise EOFError
EOFError
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:336)
 at 
org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:475)
 at 
org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:458)
 at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:290)
 at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
 at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
 at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
 at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
 at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
 at 
org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
 at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
 at 
org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
 at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
 at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:109)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: net.razorvine.pickle.PickleException: expected zero arguments for 
construction of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 
(TID 1) in 504 ms on localhost (executor driver) (7/24)
2020-06-28 07:09:57 ERROR PythonRunner:91 - This may have been caused by a 
prior exception:
net.razorvine.pickle.PickleException: expected zero arguments for construction 
of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
2020-06-28 07:09:57 INFO PythonAccumulatorV2:54 - Connected to 
AccumulatorServer at host: 127.0.0.1 port: 42450
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 7.0 in stage 0.0 
(TID 7) in 502 ms on localhost (executor driver) (8/24)
2020-06-28 07:09:57 ERROR PythonRunner:91 - This may have been caused by a 
prior exception:
net.razorvine.pickle.PickleException: expected zero arguments for construction 
of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
2020-06-28 07:09:57 INFO Executor:54 - Finished task 10.0 in stage 0.0 (TID 
10). 1461 bytes result sent to driver
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 6.0 in stage 0.0 
(TID 6) in 504 ms on localhost (executor driver) (9/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 8.0 in stage 0.0 
(TID 8) in 503 ms on localhost (executor driver) (10/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 15.0 in stage 0.0 
(TID 15) in 500 ms on localhost (executor driver) (11/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 19.0 in stage 0.0 
(TID 19) in 499 ms on localhost (executor driver) (12/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 22.0 in stage 0.0 
(TID 22) in 498 ms on localhost (executor driver) (13/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 
(TID 0) in 523 ms on localhost (executor driver) (14/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 9.0 in stage 0.0 
(TID 9) in 505 ms on localhost (executor driver) (15/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 4.0 in stage 0.0 
(TID 4) in 507 ms on localhost (executor driver) (16/24)
2020-06-28 07:09:57 ERROR Executor:91 - Exception in task 23.0 in stage 0.0 
(TID 23)
net.razorvine.pickle.PickleException: expected zero arguments for construction 
of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 3.0 in stage 0.0 
(TID 3) in 508 ms on localhost (executor driver) (17/24)
2020-06-28 07:09:57 ERROR Executor:91 - Exception in task 11.0 in stage 0.0 
(TID 11)
net.razorvine.pickle.PickleException: expected zero arguments for construction 
of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 20.0 in stage 0.0 
(TID 20) in 501 ms on localhost (executor driver) (18/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 21.0 in stage 0.0 
(TID 21) in 501 ms on localhost (executor driver) (19/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 16.0 in stage 0.0 
(TID 16) in 503 ms on localhost (executor driver) (20/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 
(TID 2) in 509 ms on localhost (executor driver) (21/24)
2020-06-28 07:09:57 INFO TaskSetManager:54 - Finished task 10.0 in stage 0.0 
(TID 10) in 506 ms on localhost (executor driver) (22/24)
2020-06-28 07:09:57 WARN TaskSetManager:66 - Lost task 11.0 in stage 0.0 (TID 
11, localhost, executor driver): net.razorvine.pickle.PickleException: expected 
zero arguments for construction of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
2020-06-28 07:09:57 ERROR TaskSetManager:70 - Task 11 in stage 0.0 failed 1 
times; aborting job
2020-06-28 07:09:57 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose 
tasks have all completed, from pool 
2020-06-28 07:09:57 INFO TaskSetManager:54 - Lost task 23.0 in stage 0.0 (TID 
23) on localhost, executor driver: net.razorvine.pickle.PickleException 
(expected zero arguments for construction of ClassDict (for 
pyarrow.lib.type_for_alias)) [duplicate 1]
2020-06-28 07:09:57 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose 
tasks have all completed, from pool 
2020-06-28 07:09:57 INFO TaskSchedulerImpl:54 - Cancelling stage 0
2020-06-28 07:09:57 INFO DAGScheduler:54 - ResultStage 0 (count at 
/nfs/home3/tahmad/tahmad/script.py:301) failed in 0.681 s due to Job aborted 
due to stage failure: Task 11 in stage 0.0 failed 1 times, most recent failure: 
Lost task 11.0 in stage 0.0 (TID 11, localhost, executor driver): 
net.razorvine.pickle.PickleException: expected zero arguments for construction 
of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
Driver stacktrace:
2020-06-28 07:09:57 INFO DAGScheduler:54 - Job 0 failed: count at 
/nfs/home3/tahmad/tahmad/script.py:301, took 0.731943 s
Traceback (most recent call last):
 File "/nfs/home3/tahmad/tahmad/script.py", line 301, in <module>
 print(converted_ardd.count())
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py",
 line 1053, in count
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py",
 line 1044, in sum
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py",
 line 915, in fold
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py",
 line 814, in collect
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 1257, in __call__
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py",
 line 63, in deco
 File 
"/home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
 line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 11 in 
stage 0.0 failed 1 times, most recent failure: Lost task 11.0 in stage 0.0 (TID 
11, localhost, executor driver): net.razorvine.pickle.PickleException: expected 
zero arguments for construction of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
Driver stacktrace:
 at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1661)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1649)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1648)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1648)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
 at scala.Option.foreach(Option.scala:257)
 at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1882)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1831)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1820)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
 at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
 at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
 at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
 at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:165)
 at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
 at py4j.Gateway.invoke(Gateway.java:282)
 at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:238)
 at java.lang.Thread.run(Thread.java:748)
Caused by: net.razorvine.pickle.PickleException: expected zero arguments for 
construction of ClassDict (for pyarrow.lib.type_for_alias)
 at 
net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:188)
 at 
org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:187)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
 at scala.collection.Iterator$class.foreach(Iterator.scala:893)
 at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)
 at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:223)
 at 
org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:444)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:250)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:173)
2020-06-28 07:09:57 INFO SparkContext:54 - Invoking stop() from shutdown hook
2020-06-28 07:09:57 INFO AbstractConnector:318 - Stopped 
Spark@27de2e9{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-06-28 07:09:57 INFO SparkUI:54 - Stopped Spark web UI at http://tcn862:4040
2020-06-28 07:09:57 INFO MapOutputTrackerMasterEndpoint:54 - 
MapOutputTrackerMasterEndpoint stopped!
2020-06-28 07:09:57 INFO MemoryStore:54 - MemoryStore cleared
2020-06-28 07:09:57 INFO BlockManager:54 - BlockManager stopped
2020-06-28 07:09:57 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2020-06-28 07:09:57 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - 
OutputCommitCoordinator stopped!
2020-06-28 07:09:57 INFO SparkContext:54 - Successfully stopped SparkContext
2020-06-28 07:09:57 INFO ShutdownHookManager:54 - Shutdown hook called
2020-06-28 07:09:57 INFO ShutdownHookManager:54 - Deleting directory 
/tmp/spark-405f8ca1-a57f-4cae-8fa4-d459dd74b5d7/pyspark-6272138d-ab9b-414e-9593-7406c89da076
2020-06-28 07:09:57 INFO ShutdownHookManager:54 - Deleting directory 
/tmp/spark-a864ecd8-9b40-4332-a06b-46f3f908421c
2020-06-28 07:09:57 INFO ShutdownHookManager:54 - Deleting directory 
/tmp/spark-405f8ca1-a57f-4cae-8fa4-d459dd74b5d7
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to