Re: --packages & Failed to load class for data source v1.4

Burak Yavuz Sun, 14 Jun 2015 17:19:43 -0700

Hi Don,
This seems related to a known issue, where the classpath on the driver is
missing the related classes. This is a bug in py4j as py4j uses the System
Classloader rather than Spark's Context Classloader. However, this problem
existed in 1.3.0 as well, therefore I'm curious whether it's the same
issue. Thanks for opening the Jira, I'll take a look.


Best,
Burak
On Jun 14, 2015 2:40 PM, "Don Drake" <dondr...@gmail.com> wrote:

>
> I looked at this again, and when I use the Scala spark-shell and load a
> CSV using the same package it works just fine, so this seems specific to
> pyspark.
>
> I've created the following JIRA:
> https://issues.apache.org/jira/browse/SPARK-8365
>
> -Don
>
> On Sat, Jun 13, 2015 at 11:46 AM, Don Drake <dondr...@gmail.com> wrote:
>
>> I downloaded the pre-compiled Spark 1.4.0 and attempted to run an
>> existing Python Spark application against it and got the following error:
>>
>> py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.
>> : java.lang.RuntimeException: Failed to load class for data source:
>> com.databricks.spark.csv
>>
>> I pass the following on the command-line to my spark-submit:
>> --packages com.databricks:spark-csv_2.10:1.0.3
>>
>> This worked fine on 1.3.1, but not in 1.4.
>>
>> I was able to replicate it with the following pyspark:
>>
>> a = {'a':1.0, 'b':'asdf'}
>> rdd = sc.parallelize([a])
>> df = sqlContext.createDataFrame(rdd)
>> df.save("/tmp/d.csv", "com.databricks.spark.csv")
>>
>>
>> Even using the new
>> df.write.format('com.databricks.spark.csv').save('/tmp/d.csv') gives the
>> same error.
>>
>> I see it was added in the web UI:
>> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jarAdded
>> By User
>> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jarAdded
>> By User
>> http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jarAdded
>> By User
>> http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jarAdded
>> By User
>> Thoughts?
>>
>> -Don
>>
>>
>>
>> Gory details:
>>
>> $ pyspark --packages "com.databricks:spark-csv_2.10:1.0.3"
>> Python 2.7.6 (default, Sep  9 2014, 15:04:36)
>> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
>> Type "help", "copyright", "credits" or "license" for more information.
>> Ivy Default Cache set to: /Users/drake/.ivy2/cache
>> The jars for the packages stored in: /Users/drake/.ivy2/jars
>> :: loading settings :: url =
>> jar:file:/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>> com.databricks#spark-csv_2.10 added as a dependency
>> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
>> confs: [default]
>> found com.databricks#spark-csv_2.10;1.0.3 in central
>> found org.apache.commons#commons-csv;1.1 in central
>> :: resolution report :: resolve 590ms :: artifacts dl 17ms
>> :: modules in use:
>> com.databricks#spark-csv_2.10;1.0.3 from central in [default]
>> org.apache.commons#commons-csv;1.1 from central in [default]
>> ---------------------------------------------------------------------
>> |                  |            modules            ||   artifacts   |
>> |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
>> ---------------------------------------------------------------------
>> |      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
>> ---------------------------------------------------------------------
>> :: retrieving :: org.apache.spark#spark-submit-parent
>> confs: [default]
>> 0 artifacts copied, 2 already retrieved (0kB/15ms)
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>> 15/06/13 11:06:08 INFO SparkContext: Running Spark version 1.4.0
>> 2015-06-13 11:06:08.921 java[19233:2145789] Unable to load realm info
>> from SCDynamicStore
>> 15/06/13 11:06:09 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 15/06/13 11:06:09 WARN Utils: Your hostname, Dons-MacBook-Pro-2.local
>> resolves to a loopback address: 127.0.0.1; using 10.0.0.222 instead (on
>> interface en0)
>> 15/06/13 11:06:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>> another address
>> 15/06/13 11:06:09 INFO SecurityManager: Changing view acls to: drake
>> 15/06/13 11:06:09 INFO SecurityManager: Changing modify acls to: drake
>> 15/06/13 11:06:09 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(drake); users
>> with modify permissions: Set(drake)
>> 15/06/13 11:06:10 INFO Slf4jLogger: Slf4jLogger started
>> 15/06/13 11:06:10 INFO Remoting: Starting remoting
>> 15/06/13 11:06:10 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://sparkDriver@10.0.0.222:56870]
>> 15/06/13 11:06:10 INFO Utils: Successfully started service 'sparkDriver'
>> on port 56870.
>> 15/06/13 11:06:10 INFO SparkEnv: Registering MapOutputTracker
>> 15/06/13 11:06:10 INFO SparkEnv: Registering BlockManagerMaster
>> 15/06/13 11:06:10 INFO DiskBlockManager: Created local directory at
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/blockmgr-a1412b71-fe56-429c-a193-ce3fb95d2ffd
>> 15/06/13 11:06:10 INFO MemoryStore: MemoryStore started with capacity
>> 265.4 MB
>> 15/06/13 11:06:10 INFO HttpFileServer: HTTP File server directory is
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/httpd-84d178da-7e60-4eed-8031-e6a0c465bd4c
>> 15/06/13 11:06:10 INFO HttpServer: Starting HTTP Server
>> 15/06/13 11:06:10 INFO Utils: Successfully started service 'HTTP file
>> server' on port 56871.
>> 15/06/13 11:06:10 INFO SparkEnv: Registering OutputCommitCoordinator
>> 15/06/13 11:06:11 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> 15/06/13 11:06:11 INFO Utils: Successfully started service 'SparkUI' on
>> port 4041.
>> 15/06/13 11:06:11 INFO SparkUI: Started SparkUI at http://10.0.0.222:4041
>> 15/06/13 11:06:11 INFO SparkContext: Added JAR
>> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar at
>> http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jar
>> with timestamp 1434211571303
>> 15/06/13 11:06:11 INFO SparkContext: Added JAR
>> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar at
>> http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jar with
>> timestamp 1434211571326
>> 15/06/13 11:06:11 INFO Utils: Copying
>> /Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar to
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/com.databricks_spark-csv_2.10-1.0.3.jar
>> 15/06/13 11:06:11 INFO SparkContext: Added file
>> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar at
>> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar with
>> timestamp 1434211571468
>> 15/06/13 11:06:11 INFO Utils: Copying
>> /Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar to
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/org.apache.commons_commons-csv-1.1.jar
>> 15/06/13 11:06:11 INFO SparkContext: Added file
>> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar at
>> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar with
>> timestamp 1434211571502
>> 15/06/13 11:06:11 INFO Executor: Starting executor ID driver on host
>> localhost
>> 15/06/13 11:06:11 INFO Utils: Successfully started service
>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56872.
>> 15/06/13 11:06:11 INFO NettyBlockTransferService: Server created on 56872
>> 15/06/13 11:06:11 INFO BlockManagerMaster: Trying to register BlockManager
>> 15/06/13 11:06:11 INFO BlockManagerMasterEndpoint: Registering block
>> manager localhost:56872 with 265.4 MB RAM, BlockManagerId(driver,
>> localhost, 56872)
>> 15/06/13 11:06:11 INFO BlockManagerMaster: Registered BlockManager
>> Welcome to
>>       ____              __
>>      / __/__  ___ _____/ /__
>>     _\ \/ _ \/ _ `/ __/  '_/
>>    /__ / .__/\_,_/_/ /_/\_\   version 1.4.0
>>       /_/
>>
>> Using Python version 2.7.6 (default, Sep  9 2014 15:04:36)
>> SparkContext available as sc, HiveContext available as sqlContext.
>> >>> a = {'a':1.0, 'b':'asdf'}
>> >>> rdd = sc.parallelize([a])
>> >>> df = sqlContext.createDataFrame(rdd)
>> 15/06/13 11:06:50 INFO SparkContext: Starting job: runJob at
>> PythonRDD.scala:366
>> 15/06/13 11:06:50 INFO DAGScheduler: Got job 0 (runJob at
>> PythonRDD.scala:366) with 1 output partitions (allowLocal=true)
>> 15/06/13 11:06:50 INFO DAGScheduler: Final stage: ResultStage 0(runJob at
>> PythonRDD.scala:366)
>> 15/06/13 11:06:50 INFO DAGScheduler: Parents of final stage: List()
>> 15/06/13 11:06:50 INFO DAGScheduler: Missing parents: List()
>> 15/06/13 11:06:50 INFO DAGScheduler: Submitting ResultStage 0
>> (PythonRDD[1] at RDD at PythonRDD.scala:43), which has no missing parents
>> 15/06/13 11:06:51 INFO MemoryStore: ensureFreeSpace(3672) called with
>> curMem=0, maxMem=278302556
>> 15/06/13 11:06:51 INFO MemoryStore: Block broadcast_0 stored as values in
>> memory (estimated size 3.6 KB, free 265.4 MB)
>> 15/06/13 11:06:51 INFO MemoryStore: ensureFreeSpace(2328) called with
>> curMem=3672, maxMem=278302556
>> 15/06/13 11:06:51 INFO MemoryStore: Block broadcast_0_piece0 stored as
>> bytes in memory (estimated size 2.3 KB, free 265.4 MB)
>> 15/06/13 11:06:51 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>> memory on localhost:56872 (size: 2.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:51 INFO SparkContext: Created broadcast 0 from broadcast
>> at DAGScheduler.scala:874
>> 15/06/13 11:06:51 INFO DAGScheduler: Submitting 1 missing tasks from
>> ResultStage 0 (PythonRDD[1] at RDD at PythonRDD.scala:43)
>> 15/06/13 11:06:51 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
>> 15/06/13 11:06:51 INFO TaskSetManager: Starting task 0.0 in stage 0.0
>> (TID 0, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
>> 15/06/13 11:06:51 INFO Executor: Fetching
>> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar with
>> timestamp 1434211571502
>> 15/06/13 11:06:51 INFO Utils:
>> /Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar has been
>> previously copied to
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/org.apache.commons_commons-csv-1.1.jar
>> 15/06/13 11:06:51 INFO Executor: Fetching
>> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar with
>> timestamp 1434211571468
>> 15/06/13 11:06:51 INFO Utils:
>> /Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar has been
>> previously copied to
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/com.databricks_spark-csv_2.10-1.0.3.jar
>> 15/06/13 11:06:51 INFO Executor: Fetching
>> http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jar with
>> timestamp 1434211571326
>> 15/06/13 11:06:51 INFO Utils: Fetching
>> http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jar to
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/fetchFileTemp2449082240048543653.tmp
>> 15/06/13 11:06:51 INFO Utils:
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/fetchFileTemp2449082240048543653.tmp
>> has been previously copied to
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/org.apache.commons_commons-csv-1.1.jar
>> 15/06/13 11:06:51 INFO Executor: Adding
>> file:/private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/org.apache.commons_commons-csv-1.1.jar
>> to class loader
>> 15/06/13 11:06:51 INFO Executor: Fetching
>> http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jar
>> with timestamp 1434211571303
>> 15/06/13 11:06:51 INFO Utils: Fetching
>> http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jar to
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/fetchFileTemp1396931258018379545.tmp
>> 15/06/13 11:06:51 INFO Utils:
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/fetchFileTemp1396931258018379545.tmp
>> has been previously copied to
>> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/com.databricks_spark-csv_2.10-1.0.3.jar
>> 15/06/13 11:06:51 INFO Executor: Adding
>> file:/private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/com.databricks_spark-csv_2.10-1.0.3.jar
>> to class loader
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 3165, boot = 3155, init
>> = 10, finish = 0
>> 15/06/13 11:06:54 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 0.0 in stage 0.0
>> (TID 0) in 3505 ms on localhost (1/1)
>> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose
>> tasks have all completed, from pool
>> 15/06/13 11:06:54 INFO DAGScheduler: ResultStage 0 (runJob at
>> PythonRDD.scala:366) finished in 3.525 s
>> 15/06/13 11:06:54 INFO DAGScheduler: Job 0 finished: runJob at
>> PythonRDD.scala:366, took 3.852112 s
>> 15/06/13 11:06:54 INFO SparkContext: Starting job: runJob at
>> PythonRDD.scala:366
>> 15/06/13 11:06:54 INFO DAGScheduler: Got job 1 (runJob at
>> PythonRDD.scala:366) with 4 output partitions (allowLocal=true)
>> 15/06/13 11:06:54 INFO DAGScheduler: Final stage: ResultStage 1(runJob at
>> PythonRDD.scala:366)
>> 15/06/13 11:06:54 INFO DAGScheduler: Parents of final stage: List()
>> 15/06/13 11:06:54 INFO DAGScheduler: Missing parents: List()
>> 15/06/13 11:06:54 INFO DAGScheduler: Submitting ResultStage 1
>> (PythonRDD[2] at RDD at PythonRDD.scala:43), which has no missing parents
>> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(3672) called with
>> curMem=6000, maxMem=278302556
>> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_1 stored as values in
>> memory (estimated size 3.6 KB, free 265.4 MB)
>> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(2330) called with
>> curMem=9672, maxMem=278302556
>> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_1_piece0 stored as
>> bytes in memory (estimated size 2.3 KB, free 265.4 MB)
>> 15/06/13 11:06:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in
>> memory on localhost:56872 (size: 2.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:54 INFO SparkContext: Created broadcast 1 from broadcast
>> at DAGScheduler.scala:874
>> 15/06/13 11:06:54 INFO DAGScheduler: Submitting 4 missing tasks from
>> ResultStage 1 (PythonRDD[2] at RDD at PythonRDD.scala:43)
>> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Adding task set 1.0 with 4 tasks
>> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 0.0 in stage 1.0
>> (TID 1, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 1.0 in stage 1.0
>> (TID 2, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 2.0 in stage 1.0
>> (TID 3, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 3.0 in stage 1.0
>> (TID 4, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:54 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
>> 15/06/13 11:06:54 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
>> 15/06/13 11:06:54 INFO Executor: Running task 2.0 in stage 1.0 (TID 3)
>> 15/06/13 11:06:54 INFO Executor: Running task 3.0 in stage 1.0 (TID 4)
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 2, boot = -15, init =
>> 17, finish = 0
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 9, boot = 6, init = 2,
>> finish = 1
>> 15/06/13 11:06:54 INFO Executor: Finished task 3.0 in stage 1.0 (TID 4).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:54 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 13, boot = 9, init = 4,
>> finish = 0
>> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 3.0 in stage 1.0
>> (TID 4) in 24 ms on localhost (1/4)
>> 15/06/13 11:06:54 INFO Executor: Finished task 2.0 in stage 1.0 (TID 3).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 0.0 in stage 1.0
>> (TID 1) in 28 ms on localhost (2/4)
>> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 2.0 in stage 1.0
>> (TID 3) in 27 ms on localhost (3/4)
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 28, boot = 28, init = 0,
>> finish = 0
>> 15/06/13 11:06:54 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 1.0 in stage 1.0
>> (TID 2) in 42 ms on localhost (4/4)
>> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose
>> tasks have all completed, from pool
>> 15/06/13 11:06:54 INFO DAGScheduler: ResultStage 1 (runJob at
>> PythonRDD.scala:366) finished in 0.044 s
>> 15/06/13 11:06:54 INFO DAGScheduler: Job 1 finished: runJob at
>> PythonRDD.scala:366, took 0.063304 s
>> 15/06/13 11:06:54 INFO SparkContext: Starting job: runJob at
>> PythonRDD.scala:366
>> 15/06/13 11:06:54 INFO DAGScheduler: Got job 2 (runJob at
>> PythonRDD.scala:366) with 3 output partitions (allowLocal=true)
>> 15/06/13 11:06:54 INFO DAGScheduler: Final stage: ResultStage 2(runJob at
>> PythonRDD.scala:366)
>> 15/06/13 11:06:54 INFO DAGScheduler: Parents of final stage: List()
>> 15/06/13 11:06:54 INFO DAGScheduler: Missing parents: List()
>> 15/06/13 11:06:54 INFO DAGScheduler: Submitting ResultStage 2
>> (PythonRDD[3] at RDD at PythonRDD.scala:43), which has no missing parents
>> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(3672) called with
>> curMem=12002, maxMem=278302556
>> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_2 stored as values in
>> memory (estimated size 3.6 KB, free 265.4 MB)
>> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(2330) called with
>> curMem=15674, maxMem=278302556
>> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_2_piece0 stored as
>> bytes in memory (estimated size 2.3 KB, free 265.4 MB)
>> 15/06/13 11:06:54 INFO BlockManagerInfo: Added broadcast_2_piece0 in
>> memory on localhost:56872 (size: 2.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:54 INFO SparkContext: Created broadcast 2 from broadcast
>> at DAGScheduler.scala:874
>> 15/06/13 11:06:54 INFO DAGScheduler: Submitting 3 missing tasks from
>> ResultStage 2 (PythonRDD[3] at RDD at PythonRDD.scala:43)
>> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Adding task set 2.0 with 3 tasks
>> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 0.0 in stage 2.0
>> (TID 5, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 1.0 in stage 2.0
>> (TID 6, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 2.0 in stage 2.0
>> (TID 7, localhost, PROCESS_LOCAL, 1708 bytes)
>> 15/06/13 11:06:54 INFO Executor: Running task 0.0 in stage 2.0 (TID 5)
>> 15/06/13 11:06:54 INFO Executor: Running task 1.0 in stage 2.0 (TID 6)
>> 15/06/13 11:06:54 INFO Executor: Running task 2.0 in stage 2.0 (TID 7)
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 2, boot = -41, init =
>> 43, finish = 0
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 2, boot = -38, init =
>> 40, finish = 0
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 1, boot = -77, init =
>> 78, finish = 0
>> 15/06/13 11:06:54 INFO Executor: Finished task 0.0 in stage 2.0 (TID 5).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:54 INFO Executor: Finished task 1.0 in stage 2.0 (TID 6).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:54 INFO Executor: Finished task 2.0 in stage 2.0 (TID 7).
>> 722 bytes result sent to driver
>> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 0.0 in stage 2.0
>> (TID 5) in 14 ms on localhost (1/3)
>> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 1.0 in stage 2.0
>> (TID 6) in 13 ms on localhost (2/3)
>> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 2.0 in stage 2.0
>> (TID 7) in 13 ms on localhost (3/3)
>> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose
>> tasks have all completed, from pool
>> 15/06/13 11:06:54 INFO DAGScheduler: ResultStage 2 (runJob at
>> PythonRDD.scala:366) finished in 0.016 s
>> 15/06/13 11:06:54 INFO DAGScheduler: Job 2 finished: runJob at
>> PythonRDD.scala:366, took 0.087522 s
>> /Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/context.py:198:
>> UserWarning: Using RDD of dict to inferSchema is deprecated,please use
>> pyspark.sql.Row instead
>>   warnings.warn("Using RDD of dict to inferSchema is deprecated,"
>> 15/06/13 11:06:54 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
>> localhost:56872 in memory (size: 2.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:54 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
>> localhost:56872 in memory (size: 2.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:54 INFO SparkContext: Starting job: runJob at
>> PythonRDD.scala:366
>> 15/06/13 11:06:54 INFO DAGScheduler: Got job 3 (runJob at
>> PythonRDD.scala:366) with 1 output partitions (allowLocal=true)
>> 15/06/13 11:06:54 INFO DAGScheduler: Final stage: ResultStage 3(runJob at
>> PythonRDD.scala:366)
>> 15/06/13 11:06:54 INFO DAGScheduler: Parents of final stage: List()
>> 15/06/13 11:06:54 INFO DAGScheduler: Missing parents: List()
>> 15/06/13 11:06:54 INFO DAGScheduler: Submitting ResultStage 3
>> (PythonRDD[4] at RDD at PythonRDD.scala:43), which has no missing parents
>> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(5120) called with
>> curMem=6002, maxMem=278302556
>> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_3 stored as values in
>> memory (estimated size 5.0 KB, free 265.4 MB)
>> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(3338) called with
>> curMem=11122, maxMem=278302556
>> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_3_piece0 stored as
>> bytes in memory (estimated size 3.3 KB, free 265.4 MB)
>> 15/06/13 11:06:54 INFO BlockManagerInfo: Added broadcast_3_piece0 in
>> memory on localhost:56872 (size: 3.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:54 INFO SparkContext: Created broadcast 3 from broadcast
>> at DAGScheduler.scala:874
>> 15/06/13 11:06:54 INFO DAGScheduler: Submitting 1 missing tasks from
>> ResultStage 3 (PythonRDD[4] at RDD at PythonRDD.scala:43)
>> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
>> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 0.0 in stage 3.0
>> (TID 8, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:54 INFO Executor: Running task 0.0 in stage 3.0 (TID 8)
>> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 2, boot = -29, init =
>> 31, finish = 0
>> 15/06/13 11:06:55 INFO Executor: Finished task 0.0 in stage 3.0 (TID 8).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 0.0 in stage 3.0
>> (TID 8) in 10 ms on localhost (1/1)
>> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose
>> tasks have all completed, from pool
>> 15/06/13 11:06:55 INFO DAGScheduler: ResultStage 3 (runJob at
>> PythonRDD.scala:366) finished in 0.011 s
>> 15/06/13 11:06:55 INFO DAGScheduler: Job 3 finished: runJob at
>> PythonRDD.scala:366, took 0.035088 s
>> 15/06/13 11:06:55 INFO SparkContext: Starting job: runJob at
>> PythonRDD.scala:366
>> 15/06/13 11:06:55 INFO DAGScheduler: Got job 4 (runJob at
>> PythonRDD.scala:366) with 4 output partitions (allowLocal=true)
>> 15/06/13 11:06:55 INFO DAGScheduler: Final stage: ResultStage 4(runJob at
>> PythonRDD.scala:366)
>> 15/06/13 11:06:55 INFO DAGScheduler: Parents of final stage: List()
>> 15/06/13 11:06:55 INFO DAGScheduler: Missing parents: List()
>> 15/06/13 11:06:55 INFO DAGScheduler: Submitting ResultStage 4
>> (PythonRDD[5] at RDD at PythonRDD.scala:43), which has no missing parents
>> 15/06/13 11:06:55 INFO MemoryStore: ensureFreeSpace(5120) called with
>> curMem=14460, maxMem=278302556
>> 15/06/13 11:06:55 INFO MemoryStore: Block broadcast_4 stored as values in
>> memory (estimated size 5.0 KB, free 265.4 MB)
>> 15/06/13 11:06:55 INFO MemoryStore: ensureFreeSpace(3337) called with
>> curMem=19580, maxMem=278302556
>> 15/06/13 11:06:55 INFO MemoryStore: Block broadcast_4_piece0 stored as
>> bytes in memory (estimated size 3.3 KB, free 265.4 MB)
>> 15/06/13 11:06:55 INFO BlockManagerInfo: Added broadcast_4_piece0 in
>> memory on localhost:56872 (size: 3.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:55 INFO SparkContext: Created broadcast 4 from broadcast
>> at DAGScheduler.scala:874
>> 15/06/13 11:06:55 INFO DAGScheduler: Submitting 4 missing tasks from
>> ResultStage 4 (PythonRDD[5] at RDD at PythonRDD.scala:43)
>> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Adding task set 4.0 with 4 tasks
>> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 0.0 in stage 4.0
>> (TID 9, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 1.0 in stage 4.0
>> (TID 10, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 2.0 in stage 4.0
>> (TID 11, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 3.0 in stage 4.0
>> (TID 12, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:55 INFO Executor: Running task 3.0 in stage 4.0 (TID 12)
>> 15/06/13 11:06:55 INFO Executor: Running task 1.0 in stage 4.0 (TID 10)
>> 15/06/13 11:06:55 INFO Executor: Running task 2.0 in stage 4.0 (TID 11)
>> 15/06/13 11:06:55 INFO Executor: Running task 0.0 in stage 4.0 (TID 9)
>> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 1, boot = -79, init =
>> 80, finish = 0
>> 15/06/13 11:06:55 INFO Executor: Finished task 3.0 in stage 4.0 (TID 12).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 1, boot = -27, init =
>> 28, finish = 0
>> 15/06/13 11:06:55 INFO Executor: Finished task 1.0 in stage 4.0 (TID 10).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 3.0 in stage 4.0
>> (TID 12) in 11 ms on localhost (1/4)
>> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 1.0 in stage 4.0
>> (TID 10) in 14 ms on localhost (2/4)
>> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 22, boot = 22, init = 0,
>> finish = 0
>> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 21, boot = 21, init = 0,
>> finish = 0
>> 15/06/13 11:06:55 INFO Executor: Finished task 2.0 in stage 4.0 (TID 11).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:55 INFO Executor: Finished task 0.0 in stage 4.0 (TID 9).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 2.0 in stage 4.0
>> (TID 11) in 37 ms on localhost (3/4)
>> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 0.0 in stage 4.0
>> (TID 9) in 43 ms on localhost (4/4)
>> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose
>> tasks have all completed, from pool
>> 15/06/13 11:06:55 INFO DAGScheduler: ResultStage 4 (runJob at
>> PythonRDD.scala:366) finished in 0.044 s
>> 15/06/13 11:06:55 INFO DAGScheduler: Job 4 finished: runJob at
>> PythonRDD.scala:366, took 0.059163 s
>> 15/06/13 11:06:55 INFO SparkContext: Starting job: runJob at
>> PythonRDD.scala:366
>> 15/06/13 11:06:55 INFO DAGScheduler: Got job 5 (runJob at
>> PythonRDD.scala:366) with 3 output partitions (allowLocal=true)
>> 15/06/13 11:06:55 INFO DAGScheduler: Final stage: ResultStage 5(runJob at
>> PythonRDD.scala:366)
>> 15/06/13 11:06:55 INFO DAGScheduler: Parents of final stage: List()
>> 15/06/13 11:06:55 INFO DAGScheduler: Missing parents: List()
>> 15/06/13 11:06:55 INFO DAGScheduler: Submitting ResultStage 5
>> (PythonRDD[6] at RDD at PythonRDD.scala:43), which has no missing parents
>> 15/06/13 11:06:55 INFO MemoryStore: ensureFreeSpace(5120) called with
>> curMem=22917, maxMem=278302556
>> 15/06/13 11:06:55 INFO MemoryStore: Block broadcast_5 stored as values in
>> memory (estimated size 5.0 KB, free 265.4 MB)
>> 15/06/13 11:06:55 INFO MemoryStore: ensureFreeSpace(3338) called with
>> curMem=28037, maxMem=278302556
>> 15/06/13 11:06:55 INFO MemoryStore: Block broadcast_5_piece0 stored as
>> bytes in memory (estimated size 3.3 KB, free 265.4 MB)
>> 15/06/13 11:06:55 INFO BlockManagerInfo: Added broadcast_5_piece0 in
>> memory on localhost:56872 (size: 3.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:55 INFO SparkContext: Created broadcast 5 from broadcast
>> at DAGScheduler.scala:874
>> 15/06/13 11:06:55 INFO DAGScheduler: Submitting 3 missing tasks from
>> ResultStage 5 (PythonRDD[6] at RDD at PythonRDD.scala:43)
>> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Adding task set 5.0 with 3 tasks
>> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 0.0 in stage 5.0
>> (TID 13, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 1.0 in stage 5.0
>> (TID 14, localhost, PROCESS_LOCAL, 1665 bytes)
>> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 2.0 in stage 5.0
>> (TID 15, localhost, PROCESS_LOCAL, 1708 bytes)
>> 15/06/13 11:06:55 INFO Executor: Running task 0.0 in stage 5.0 (TID 13)
>> 15/06/13 11:06:55 INFO Executor: Running task 1.0 in stage 5.0 (TID 14)
>> 15/06/13 11:06:55 INFO Executor: Running task 2.0 in stage 5.0 (TID 15)
>> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 1, boot = -24, init =
>> 25, finish = 0
>> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 1, boot = -24, init =
>> 25, finish = 0
>> 15/06/13 11:06:55 INFO Executor: Finished task 0.0 in stage 5.0 (TID 13).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:55 INFO Executor: Finished task 2.0 in stage 5.0 (TID 15).
>> 716 bytes result sent to driver
>> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 0.0 in stage 5.0
>> (TID 13) in 12 ms on localhost (1/3)
>> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 2.0 in stage 5.0
>> (TID 15) in 11 ms on localhost (2/3)
>> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 21, boot = 20, init = 1,
>> finish = 0
>> 15/06/13 11:06:55 INFO Executor: Finished task 1.0 in stage 5.0 (TID 14).
>> 666 bytes result sent to driver
>> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 1.0 in stage 5.0
>> (TID 14) in 36 ms on localhost (3/3)
>> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose
>> tasks have all completed, from pool
>> 15/06/13 11:06:55 INFO DAGScheduler: ResultStage 5 (runJob at
>> PythonRDD.scala:366) finished in 0.038 s
>> 15/06/13 11:06:55 INFO DAGScheduler: Job 5 finished: runJob at
>> PythonRDD.scala:366, took 0.052978 s
>> 15/06/13 11:06:56 INFO HiveContext: Initializing execution hive, version
>> 0.13.1
>> 15/06/13 11:06:56 INFO HiveMetaStore: 0: Opening raw store with
>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>> 15/06/13 11:06:56 INFO ObjectStore: ObjectStore, initialize called
>> 15/06/13 11:06:57 INFO Persistence: Property datanucleus.cache.level2
>> unknown - will be ignored
>> 15/06/13 11:06:57 INFO Persistence: Property
>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>> 15/06/13 11:06:57 WARN Connection: BoneCP specified but not present in
>> CLASSPATH (or one of dependencies)
>> 15/06/13 11:06:58 WARN Connection: BoneCP specified but not present in
>> CLASSPATH (or one of dependencies)
>> 15/06/13 11:06:59 INFO BlockManagerInfo: Removed broadcast_5_piece0 on
>> localhost:56872 in memory (size: 3.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:59 INFO BlockManagerInfo: Removed broadcast_4_piece0 on
>> localhost:56872 in memory (size: 3.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:59 INFO BlockManagerInfo: Removed broadcast_3_piece0 on
>> localhost:56872 in memory (size: 3.3 KB, free: 265.4 MB)
>> 15/06/13 11:06:59 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
>> localhost:56872 in memory (size: 2.3 KB, free: 265.4 MB)
>> 15/06/13 11:07:00 INFO ObjectStore: Setting MetaStore object pin classes
>> with
>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>> 15/06/13 11:07:00 INFO MetaStoreDirectSql: MySQL check failed, assuming
>> we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@"
>> (64), after : "".
>> 15/06/13 11:07:01 INFO Datastore: The class
>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
>> "embedded-only" so does not have its own datastore table.
>> 15/06/13 11:07:01 INFO Datastore: The class
>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
>> "embedded-only" so does not have its own datastore table.
>> 15/06/13 11:07:04 INFO Datastore: The class
>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
>> "embedded-only" so does not have its own datastore table.
>> 15/06/13 11:07:04 INFO Datastore: The class
>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
>> "embedded-only" so does not have its own datastore table.
>> 15/06/13 11:07:04 INFO ObjectStore: Initialized ObjectStore
>> 15/06/13 11:07:04 WARN ObjectStore: Version information not found in
>> metastore. hive.metastore.schema.verification is not enabled so recording
>> the schema version 0.13.1aa
>> 15/06/13 11:07:05 INFO HiveMetaStore: Added admin role in metastore
>> 15/06/13 11:07:05 INFO HiveMetaStore: Added public role in metastore
>> 15/06/13 11:07:05 INFO HiveMetaStore: No user is added in admin role,
>> since config is empty
>> 15/06/13 11:07:05 INFO SessionState: No Tez session required at this
>> point. hive.execution.engine=mr.
>> 15/06/13 11:07:06 INFO HiveContext: Initializing HiveMetastoreConnection
>> version 0.13.1 using Spark classes.
>> 15/06/13 11:07:07 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 15/06/13 11:07:08 INFO HiveMetaStore: 0: Opening raw store with
>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>> 15/06/13 11:07:08 INFO ObjectStore: ObjectStore, initialize called
>> 15/06/13 11:07:08 INFO Persistence: Property datanucleus.cache.level2
>> unknown - will be ignored
>> 15/06/13 11:07:08 INFO Persistence: Property
>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>> 15/06/13 11:07:08 WARN Connection: BoneCP specified but not present in
>> CLASSPATH (or one of dependencies)
>> 15/06/13 11:07:09 WARN Connection: BoneCP specified but not present in
>> CLASSPATH (or one of dependencies)
>> 15/06/13 11:07:11 INFO ObjectStore: Setting MetaStore object pin classes
>> with
>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>> 15/06/13 11:07:11 INFO MetaStoreDirectSql: MySQL check failed, assuming
>> we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@"
>> (64), after : "".
>> 15/06/13 11:07:12 INFO Datastore: The class
>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
>> "embedded-only" so does not have its own datastore table.
>> 15/06/13 11:07:12 INFO Datastore: The class
>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
>> "embedded-only" so does not have its own datastore table.
>> 15/06/13 11:07:13 INFO Datastore: The class
>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
>> "embedded-only" so does not have its own datastore table.
>> 15/06/13 11:07:13 INFO Datastore: The class
>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
>> "embedded-only" so does not have its own datastore table.
>> 15/06/13 11:07:14 INFO ObjectStore: Initialized ObjectStore
>> 15/06/13 11:07:14 WARN ObjectStore: Version information not found in
>> metastore. hive.metastore.schema.verification is not enabled so recording
>> the schema version 0.13.1aa
>> 15/06/13 11:07:14 INFO HiveMetaStore: Added admin role in metastore
>> 15/06/13 11:07:15 INFO HiveMetaStore: Added public role in metastore
>> 15/06/13 11:07:15 INFO HiveMetaStore: No user is added in admin role,
>> since config is empty
>> 15/06/13 11:07:15 INFO SessionState: No Tez session required at this
>> point. hive.execution.engine=mr.
>> >>>
>> >>> df.save("/tmp/d.csv", "com.databricks.spark.csv")
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File
>> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/dataframe.py",
>> line 202, in save
>>     return self.write.save(path, source, mode, **options)
>>   File
>> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py",
>> line 295, in save
>>     self._jwrite.save(path)
>>   File
>> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>> line 538, in __call__
>>   File
>> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>> line 300, in get_return_value
>> py4j.protocol.Py4JJavaError: An error occurred while calling o86.save.
>> : java.lang.RuntimeException: Failed to load class for data source:
>> com.databricks.spark.csv
>> at scala.sys.package$.error(package.scala:27)
>> at
>> org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:216)
>> at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:302)
>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>> at py4j.Gateway.invoke(Gateway.java:259)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> >>>
>> ...
>> >>> df.write.format("com.databricks.spark.csv").save("/tmp/d.csv")
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File
>> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py",
>> line 295, in save
>>     self._jwrite.save(path)
>>   File
>> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>> line 538, in __call__
>>   File
>> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>> line 300, in get_return_value
>> py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.
>> : java.lang.RuntimeException: Failed to load class for data source:
>> com.databricks.spark.csv
>> at scala.sys.package$.error(package.scala:27)
>> at
>> org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:216)
>> at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:302)
>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>> at py4j.Gateway.invoke(Gateway.java:259)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> >>>
>>
>> --
>> Donald Drake
>> Drake Consulting
>> http://www.drakeconsulting.com/
>> http://www.MailLaunder.com/
>> 800-733-2143
>>
>
>
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> http://www.MailLaunder.com/
> 800-733-2143
>

Re: --packages & Failed to load class for data source v1.4

Reply via email to