[jira] [Updated] (SPARK-8365) pyspark does not retain --packages or --jars passed on the command line as of 1.4.0

Michael Armbrust (JIRA) Wed, 17 Jun 2015 13:56:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Armbrust updated SPARK-8365:
------------------------------------
    Assignee: Andrew Or

> pyspark does not retain --packages or --jars passed on the command line as of 
> 1.4.0
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-8365
>                 URL: https://issues.apache.org/jira/browse/SPARK-8365
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.4.0
>            Reporter: Don Drake
>            Assignee: Andrew Or
>            Priority: Blocker
>
> I downloaded the pre-compiled Spark 1.4.0 and attempted to run an existing 
> Python Spark application against it and got the following error:
> py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.
> : java.lang.RuntimeException: Failed to load class for data source: 
> com.databricks.spark.csv
> I pass the following on the command-line to my spark-submit:
> --packages com.databricks:spark-csv_2.10:1.0.3
> This worked fine on 1.3.1, but not in 1.4.
> I was able to replicate it with the following pyspark:
> {code}
> a = {'a':1.0, 'b':'asdf'}
> rdd = sc.parallelize([a])
> df = sqlContext.createDataFrame(rdd)
> df.save("/tmp/d.csv", "com.databricks.spark.csv")
> {code}
> Even using the new 
> df.write.format('com.databricks.spark.csv').save('/tmp/d.csv') gives the same 
> error. 
> I see it was added in the web UI:
> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar  Added 
> By User
> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar   Added 
> By User
> http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jar  Added 
> By User
> http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jar   Added 
> By User
> Thoughts?
> *I also attempted using the Scala spark-shell to load a csv using the same 
> package and it worked just fine, so this seems specific to pyspark.*
> -Don
> Gory details:
> {code}
> $ pyspark --packages "com.databricks:spark-csv_2.10:1.0.3"
> Python 2.7.6 (default, Sep  9 2014, 15:04:36)
> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> Ivy Default Cache set to: /Users/drake/.ivy2/cache
> The jars for the packages stored in: /Users/drake/.ivy2/jars
> :: loading settings :: url = 
> jar:file:/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
> com.databricks#spark-csv_2.10 added as a dependency
> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
>       confs: [default]
>       found com.databricks#spark-csv_2.10;1.0.3 in central
>       found org.apache.commons#commons-csv;1.1 in central
> :: resolution report :: resolve 590ms :: artifacts dl 17ms
>       :: modules in use:
>       com.databricks#spark-csv_2.10;1.0.3 from central in [default]
>       org.apache.commons#commons-csv;1.1 from central in [default]
>       ---------------------------------------------------------------------
>       |                  |            modules            ||   artifacts   |
>       |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
>       ---------------------------------------------------------------------
>       |      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
>       ---------------------------------------------------------------------
> :: retrieving :: org.apache.spark#spark-submit-parent
>       confs: [default]
>       0 artifacts copied, 2 already retrieved (0kB/15ms)
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 15/06/13 11:06:08 INFO SparkContext: Running Spark version 1.4.0
> 2015-06-13 11:06:08.921 java[19233:2145789] Unable to load realm info from 
> SCDynamicStore
> 15/06/13 11:06:09 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 15/06/13 11:06:09 WARN Utils: Your hostname, Dons-MacBook-Pro-2.local 
> resolves to a loopback address: 127.0.0.1; using 10.0.0.222 instead (on 
> interface en0)
> 15/06/13 11:06:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 15/06/13 11:06:09 INFO SecurityManager: Changing view acls to: drake
> 15/06/13 11:06:09 INFO SecurityManager: Changing modify acls to: drake
> 15/06/13 11:06:09 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(drake); users 
> with modify permissions: Set(drake)
> 15/06/13 11:06:10 INFO Slf4jLogger: Slf4jLogger started
> 15/06/13 11:06:10 INFO Remoting: Starting remoting
> 15/06/13 11:06:10 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriver@10.0.0.222:56870]
> 15/06/13 11:06:10 INFO Utils: Successfully started service 'sparkDriver' on 
> port 56870.
> 15/06/13 11:06:10 INFO SparkEnv: Registering MapOutputTracker
> 15/06/13 11:06:10 INFO SparkEnv: Registering BlockManagerMaster
> 15/06/13 11:06:10 INFO DiskBlockManager: Created local directory at 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/blockmgr-a1412b71-fe56-429c-a193-ce3fb95d2ffd
> 15/06/13 11:06:10 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
> 15/06/13 11:06:10 INFO HttpFileServer: HTTP File server directory is 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/httpd-84d178da-7e60-4eed-8031-e6a0c465bd4c
> 15/06/13 11:06:10 INFO HttpServer: Starting HTTP Server
> 15/06/13 11:06:10 INFO Utils: Successfully started service 'HTTP file server' 
> on port 56871.
> 15/06/13 11:06:10 INFO SparkEnv: Registering OutputCommitCoordinator
> 15/06/13 11:06:11 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
> Attempting port 4041.
> 15/06/13 11:06:11 INFO Utils: Successfully started service 'SparkUI' on port 
> 4041.
> 15/06/13 11:06:11 INFO SparkUI: Started SparkUI at http://10.0.0.222:4041
> 15/06/13 11:06:11 INFO SparkContext: Added JAR 
> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar at 
> http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jar with 
> timestamp 1434211571303
> 15/06/13 11:06:11 INFO SparkContext: Added JAR 
> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar at 
> http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jar with 
> timestamp 1434211571326
> 15/06/13 11:06:11 INFO Utils: Copying 
> /Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar to 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/com.databricks_spark-csv_2.10-1.0.3.jar
> 15/06/13 11:06:11 INFO SparkContext: Added file 
> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar at 
> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar with 
> timestamp 1434211571468
> 15/06/13 11:06:11 INFO Utils: Copying 
> /Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar to 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/org.apache.commons_commons-csv-1.1.jar
> 15/06/13 11:06:11 INFO SparkContext: Added file 
> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar at 
> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar with 
> timestamp 1434211571502
> 15/06/13 11:06:11 INFO Executor: Starting executor ID driver on host localhost
> 15/06/13 11:06:11 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56872.
> 15/06/13 11:06:11 INFO NettyBlockTransferService: Server created on 56872
> 15/06/13 11:06:11 INFO BlockManagerMaster: Trying to register BlockManager
> 15/06/13 11:06:11 INFO BlockManagerMasterEndpoint: Registering block manager 
> localhost:56872 with 265.4 MB RAM, BlockManagerId(driver, localhost, 56872)
> 15/06/13 11:06:11 INFO BlockManagerMaster: Registered BlockManager
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 1.4.0
>       /_/
> Using Python version 2.7.6 (default, Sep  9 2014 15:04:36)
> SparkContext available as sc, HiveContext available as sqlContext.
> >>> a = {'a':1.0, 'b':'asdf'}
> >>> rdd = sc.parallelize([a])
> >>> df = sqlContext.createDataFrame(rdd)
> 15/06/13 11:06:50 INFO SparkContext: Starting job: runJob at 
> PythonRDD.scala:366
> 15/06/13 11:06:50 INFO DAGScheduler: Got job 0 (runJob at 
> PythonRDD.scala:366) with 1 output partitions (allowLocal=true)
> 15/06/13 11:06:50 INFO DAGScheduler: Final stage: ResultStage 0(runJob at 
> PythonRDD.scala:366)
> 15/06/13 11:06:50 INFO DAGScheduler: Parents of final stage: List()
> 15/06/13 11:06:50 INFO DAGScheduler: Missing parents: List()
> 15/06/13 11:06:50 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] 
> at RDD at PythonRDD.scala:43), which has no missing parents
> 15/06/13 11:06:51 INFO MemoryStore: ensureFreeSpace(3672) called with 
> curMem=0, maxMem=278302556
> 15/06/13 11:06:51 INFO MemoryStore: Block broadcast_0 stored as values in 
> memory (estimated size 3.6 KB, free 265.4 MB)
> 15/06/13 11:06:51 INFO MemoryStore: ensureFreeSpace(2328) called with 
> curMem=3672, maxMem=278302556
> 15/06/13 11:06:51 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
> in memory (estimated size 2.3 KB, free 265.4 MB)
> 15/06/13 11:06:51 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
> on localhost:56872 (size: 2.3 KB, free: 265.4 MB)
> 15/06/13 11:06:51 INFO SparkContext: Created broadcast 0 from broadcast at 
> DAGScheduler.scala:874
> 15/06/13 11:06:51 INFO DAGScheduler: Submitting 1 missing tasks from 
> ResultStage 0 (PythonRDD[1] at RDD at PythonRDD.scala:43)
> 15/06/13 11:06:51 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
> 15/06/13 11:06:51 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 15/06/13 11:06:51 INFO Executor: Fetching 
> file:/Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar with 
> timestamp 1434211571502
> 15/06/13 11:06:51 INFO Utils: 
> /Users/drake/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar has been 
> previously copied to 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/org.apache.commons_commons-csv-1.1.jar
> 15/06/13 11:06:51 INFO Executor: Fetching 
> file:/Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar with 
> timestamp 1434211571468
> 15/06/13 11:06:51 INFO Utils: 
> /Users/drake/.ivy2/jars/com.databricks_spark-csv_2.10-1.0.3.jar has been 
> previously copied to 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/com.databricks_spark-csv_2.10-1.0.3.jar
> 15/06/13 11:06:51 INFO Executor: Fetching 
> http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jar with 
> timestamp 1434211571326
> 15/06/13 11:06:51 INFO Utils: Fetching 
> http://10.0.0.222:56871/jars/org.apache.commons_commons-csv-1.1.jar to 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/fetchFileTemp2449082240048543653.tmp
> 15/06/13 11:06:51 INFO Utils: 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/fetchFileTemp2449082240048543653.tmp
>  has been previously copied to 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/org.apache.commons_commons-csv-1.1.jar
> 15/06/13 11:06:51 INFO Executor: Adding 
> file:/private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/org.apache.commons_commons-csv-1.1.jar
>  to class loader
> 15/06/13 11:06:51 INFO Executor: Fetching 
> http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jar with 
> timestamp 1434211571303
> 15/06/13 11:06:51 INFO Utils: Fetching 
> http://10.0.0.222:56871/jars/com.databricks_spark-csv_2.10-1.0.3.jar to 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/fetchFileTemp1396931258018379545.tmp
> 15/06/13 11:06:51 INFO Utils: 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/fetchFileTemp1396931258018379545.tmp
>  has been previously copied to 
> /private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/com.databricks_spark-csv_2.10-1.0.3.jar
> 15/06/13 11:06:51 INFO Executor: Adding 
> file:/private/var/folders/7_/k5h82ws97b95v5f5h8wf9j0h0000gn/T/spark-f36f39f5-7f82-42e0-b3e0-9eb1e1cc0816/userFiles-1cab505b-7e88-4f9c-82d9-d6b361689d9d/com.databricks_spark-csv_2.10-1.0.3.jar
>  to class loader
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 3165, boot = 3155, init = 
> 10, finish = 0
> 15/06/13 11:06:54 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 666 
> bytes result sent to driver
> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
> in 3505 ms on localhost (1/1)
> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
> have all completed, from pool
> 15/06/13 11:06:54 INFO DAGScheduler: ResultStage 0 (runJob at 
> PythonRDD.scala:366) finished in 3.525 s
> 15/06/13 11:06:54 INFO DAGScheduler: Job 0 finished: runJob at 
> PythonRDD.scala:366, took 3.852112 s
> 15/06/13 11:06:54 INFO SparkContext: Starting job: runJob at 
> PythonRDD.scala:366
> 15/06/13 11:06:54 INFO DAGScheduler: Got job 1 (runJob at 
> PythonRDD.scala:366) with 4 output partitions (allowLocal=true)
> 15/06/13 11:06:54 INFO DAGScheduler: Final stage: ResultStage 1(runJob at 
> PythonRDD.scala:366)
> 15/06/13 11:06:54 INFO DAGScheduler: Parents of final stage: List()
> 15/06/13 11:06:54 INFO DAGScheduler: Missing parents: List()
> 15/06/13 11:06:54 INFO DAGScheduler: Submitting ResultStage 1 (PythonRDD[2] 
> at RDD at PythonRDD.scala:43), which has no missing parents
> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(3672) called with 
> curMem=6000, maxMem=278302556
> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 3.6 KB, free 265.4 MB)
> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(2330) called with 
> curMem=9672, maxMem=278302556
> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
> in memory (estimated size 2.3 KB, free 265.4 MB)
> 15/06/13 11:06:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on localhost:56872 (size: 2.3 KB, free: 265.4 MB)
> 15/06/13 11:06:54 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:874
> 15/06/13 11:06:54 INFO DAGScheduler: Submitting 4 missing tasks from 
> ResultStage 1 (PythonRDD[2] at RDD at PythonRDD.scala:43)
> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Adding task set 1.0 with 4 tasks
> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:54 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
> 15/06/13 11:06:54 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
> 15/06/13 11:06:54 INFO Executor: Running task 2.0 in stage 1.0 (TID 3)
> 15/06/13 11:06:54 INFO Executor: Running task 3.0 in stage 1.0 (TID 4)
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 2, boot = -15, init = 17, 
> finish = 0
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 9, boot = 6, init = 2, 
> finish = 1
> 15/06/13 11:06:54 INFO Executor: Finished task 3.0 in stage 1.0 (TID 4). 666 
> bytes result sent to driver
> 15/06/13 11:06:54 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 666 
> bytes result sent to driver
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 13, boot = 9, init = 4, 
> finish = 0
> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 4) 
> in 24 ms on localhost (1/4)
> 15/06/13 11:06:54 INFO Executor: Finished task 2.0 in stage 1.0 (TID 3). 666 
> bytes result sent to driver
> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) 
> in 28 ms on localhost (2/4)
> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 3) 
> in 27 ms on localhost (3/4)
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 28, boot = 28, init = 0, 
> finish = 0
> 15/06/13 11:06:54 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 666 
> bytes result sent to driver
> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) 
> in 42 ms on localhost (4/4)
> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks 
> have all completed, from pool
> 15/06/13 11:06:54 INFO DAGScheduler: ResultStage 1 (runJob at 
> PythonRDD.scala:366) finished in 0.044 s
> 15/06/13 11:06:54 INFO DAGScheduler: Job 1 finished: runJob at 
> PythonRDD.scala:366, took 0.063304 s
> 15/06/13 11:06:54 INFO SparkContext: Starting job: runJob at 
> PythonRDD.scala:366
> 15/06/13 11:06:54 INFO DAGScheduler: Got job 2 (runJob at 
> PythonRDD.scala:366) with 3 output partitions (allowLocal=true)
> 15/06/13 11:06:54 INFO DAGScheduler: Final stage: ResultStage 2(runJob at 
> PythonRDD.scala:366)
> 15/06/13 11:06:54 INFO DAGScheduler: Parents of final stage: List()
> 15/06/13 11:06:54 INFO DAGScheduler: Missing parents: List()
> 15/06/13 11:06:54 INFO DAGScheduler: Submitting ResultStage 2 (PythonRDD[3] 
> at RDD at PythonRDD.scala:43), which has no missing parents
> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(3672) called with 
> curMem=12002, maxMem=278302556
> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_2 stored as values in 
> memory (estimated size 3.6 KB, free 265.4 MB)
> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(2330) called with 
> curMem=15674, maxMem=278302556
> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes 
> in memory (estimated size 2.3 KB, free 265.4 MB)
> 15/06/13 11:06:54 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory 
> on localhost:56872 (size: 2.3 KB, free: 265.4 MB)
> 15/06/13 11:06:54 INFO SparkContext: Created broadcast 2 from broadcast at 
> DAGScheduler.scala:874
> 15/06/13 11:06:54 INFO DAGScheduler: Submitting 3 missing tasks from 
> ResultStage 2 (PythonRDD[3] at RDD at PythonRDD.scala:43)
> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Adding task set 2.0 with 3 tasks
> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 5, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 6, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 7, 
> localhost, PROCESS_LOCAL, 1708 bytes)
> 15/06/13 11:06:54 INFO Executor: Running task 0.0 in stage 2.0 (TID 5)
> 15/06/13 11:06:54 INFO Executor: Running task 1.0 in stage 2.0 (TID 6)
> 15/06/13 11:06:54 INFO Executor: Running task 2.0 in stage 2.0 (TID 7)
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 2, boot = -41, init = 43, 
> finish = 0
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 2, boot = -38, init = 40, 
> finish = 0
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 1, boot = -77, init = 78, 
> finish = 0
> 15/06/13 11:06:54 INFO Executor: Finished task 0.0 in stage 2.0 (TID 5). 666 
> bytes result sent to driver
> 15/06/13 11:06:54 INFO Executor: Finished task 1.0 in stage 2.0 (TID 6). 666 
> bytes result sent to driver
> 15/06/13 11:06:54 INFO Executor: Finished task 2.0 in stage 2.0 (TID 7). 722 
> bytes result sent to driver
> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 5) 
> in 14 ms on localhost (1/3)
> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 6) 
> in 13 ms on localhost (2/3)
> 15/06/13 11:06:54 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 7) 
> in 13 ms on localhost (3/3)
> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks 
> have all completed, from pool
> 15/06/13 11:06:54 INFO DAGScheduler: ResultStage 2 (runJob at 
> PythonRDD.scala:366) finished in 0.016 s
> 15/06/13 11:06:54 INFO DAGScheduler: Job 2 finished: runJob at 
> PythonRDD.scala:366, took 0.087522 s
> /Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/context.py:198:
>  UserWarning: Using RDD of dict to inferSchema is deprecated,please use 
> pyspark.sql.Row instead
>   warnings.warn("Using RDD of dict to inferSchema is deprecated,"
> 15/06/13 11:06:54 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
> localhost:56872 in memory (size: 2.3 KB, free: 265.4 MB)
> 15/06/13 11:06:54 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 
> localhost:56872 in memory (size: 2.3 KB, free: 265.4 MB)
> 15/06/13 11:06:54 INFO SparkContext: Starting job: runJob at 
> PythonRDD.scala:366
> 15/06/13 11:06:54 INFO DAGScheduler: Got job 3 (runJob at 
> PythonRDD.scala:366) with 1 output partitions (allowLocal=true)
> 15/06/13 11:06:54 INFO DAGScheduler: Final stage: ResultStage 3(runJob at 
> PythonRDD.scala:366)
> 15/06/13 11:06:54 INFO DAGScheduler: Parents of final stage: List()
> 15/06/13 11:06:54 INFO DAGScheduler: Missing parents: List()
> 15/06/13 11:06:54 INFO DAGScheduler: Submitting ResultStage 3 (PythonRDD[4] 
> at RDD at PythonRDD.scala:43), which has no missing parents
> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(5120) called with 
> curMem=6002, maxMem=278302556
> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_3 stored as values in 
> memory (estimated size 5.0 KB, free 265.4 MB)
> 15/06/13 11:06:54 INFO MemoryStore: ensureFreeSpace(3338) called with 
> curMem=11122, maxMem=278302556
> 15/06/13 11:06:54 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes 
> in memory (estimated size 3.3 KB, free 265.4 MB)
> 15/06/13 11:06:54 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory 
> on localhost:56872 (size: 3.3 KB, free: 265.4 MB)
> 15/06/13 11:06:54 INFO SparkContext: Created broadcast 3 from broadcast at 
> DAGScheduler.scala:874
> 15/06/13 11:06:54 INFO DAGScheduler: Submitting 1 missing tasks from 
> ResultStage 3 (PythonRDD[4] at RDD at PythonRDD.scala:43)
> 15/06/13 11:06:54 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
> 15/06/13 11:06:54 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 8, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:54 INFO Executor: Running task 0.0 in stage 3.0 (TID 8)
> 15/06/13 11:06:54 INFO PythonRDD: Times: total = 2, boot = -29, init = 31, 
> finish = 0
> 15/06/13 11:06:55 INFO Executor: Finished task 0.0 in stage 3.0 (TID 8). 666 
> bytes result sent to driver
> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 8) 
> in 10 ms on localhost (1/1)
> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks 
> have all completed, from pool
> 15/06/13 11:06:55 INFO DAGScheduler: ResultStage 3 (runJob at 
> PythonRDD.scala:366) finished in 0.011 s
> 15/06/13 11:06:55 INFO DAGScheduler: Job 3 finished: runJob at 
> PythonRDD.scala:366, took 0.035088 s
> 15/06/13 11:06:55 INFO SparkContext: Starting job: runJob at 
> PythonRDD.scala:366
> 15/06/13 11:06:55 INFO DAGScheduler: Got job 4 (runJob at 
> PythonRDD.scala:366) with 4 output partitions (allowLocal=true)
> 15/06/13 11:06:55 INFO DAGScheduler: Final stage: ResultStage 4(runJob at 
> PythonRDD.scala:366)
> 15/06/13 11:06:55 INFO DAGScheduler: Parents of final stage: List()
> 15/06/13 11:06:55 INFO DAGScheduler: Missing parents: List()
> 15/06/13 11:06:55 INFO DAGScheduler: Submitting ResultStage 4 (PythonRDD[5] 
> at RDD at PythonRDD.scala:43), which has no missing parents
> 15/06/13 11:06:55 INFO MemoryStore: ensureFreeSpace(5120) called with 
> curMem=14460, maxMem=278302556
> 15/06/13 11:06:55 INFO MemoryStore: Block broadcast_4 stored as values in 
> memory (estimated size 5.0 KB, free 265.4 MB)
> 15/06/13 11:06:55 INFO MemoryStore: ensureFreeSpace(3337) called with 
> curMem=19580, maxMem=278302556
> 15/06/13 11:06:55 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes 
> in memory (estimated size 3.3 KB, free 265.4 MB)
> 15/06/13 11:06:55 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory 
> on localhost:56872 (size: 3.3 KB, free: 265.4 MB)
> 15/06/13 11:06:55 INFO SparkContext: Created broadcast 4 from broadcast at 
> DAGScheduler.scala:874
> 15/06/13 11:06:55 INFO DAGScheduler: Submitting 4 missing tasks from 
> ResultStage 4 (PythonRDD[5] at RDD at PythonRDD.scala:43)
> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Adding task set 4.0 with 4 tasks
> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 9, 
> localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 1.0 in stage 4.0 (TID 
> 10, localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 2.0 in stage 4.0 (TID 
> 11, localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 3.0 in stage 4.0 (TID 
> 12, localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:55 INFO Executor: Running task 3.0 in stage 4.0 (TID 12)
> 15/06/13 11:06:55 INFO Executor: Running task 1.0 in stage 4.0 (TID 10)
> 15/06/13 11:06:55 INFO Executor: Running task 2.0 in stage 4.0 (TID 11)
> 15/06/13 11:06:55 INFO Executor: Running task 0.0 in stage 4.0 (TID 9)
> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 1, boot = -79, init = 80, 
> finish = 0
> 15/06/13 11:06:55 INFO Executor: Finished task 3.0 in stage 4.0 (TID 12). 666 
> bytes result sent to driver
> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 1, boot = -27, init = 28, 
> finish = 0
> 15/06/13 11:06:55 INFO Executor: Finished task 1.0 in stage 4.0 (TID 10). 666 
> bytes result sent to driver
> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 3.0 in stage 4.0 (TID 
> 12) in 11 ms on localhost (1/4)
> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 1.0 in stage 4.0 (TID 
> 10) in 14 ms on localhost (2/4)
> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 22, boot = 22, init = 0, 
> finish = 0
> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 21, boot = 21, init = 0, 
> finish = 0
> 15/06/13 11:06:55 INFO Executor: Finished task 2.0 in stage 4.0 (TID 11). 666 
> bytes result sent to driver
> 15/06/13 11:06:55 INFO Executor: Finished task 0.0 in stage 4.0 (TID 9). 666 
> bytes result sent to driver
> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 2.0 in stage 4.0 (TID 
> 11) in 37 ms on localhost (3/4)
> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 9) 
> in 43 ms on localhost (4/4)
> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks 
> have all completed, from pool
> 15/06/13 11:06:55 INFO DAGScheduler: ResultStage 4 (runJob at 
> PythonRDD.scala:366) finished in 0.044 s
> 15/06/13 11:06:55 INFO DAGScheduler: Job 4 finished: runJob at 
> PythonRDD.scala:366, took 0.059163 s
> 15/06/13 11:06:55 INFO SparkContext: Starting job: runJob at 
> PythonRDD.scala:366
> 15/06/13 11:06:55 INFO DAGScheduler: Got job 5 (runJob at 
> PythonRDD.scala:366) with 3 output partitions (allowLocal=true)
> 15/06/13 11:06:55 INFO DAGScheduler: Final stage: ResultStage 5(runJob at 
> PythonRDD.scala:366)
> 15/06/13 11:06:55 INFO DAGScheduler: Parents of final stage: List()
> 15/06/13 11:06:55 INFO DAGScheduler: Missing parents: List()
> 15/06/13 11:06:55 INFO DAGScheduler: Submitting ResultStage 5 (PythonRDD[6] 
> at RDD at PythonRDD.scala:43), which has no missing parents
> 15/06/13 11:06:55 INFO MemoryStore: ensureFreeSpace(5120) called with 
> curMem=22917, maxMem=278302556
> 15/06/13 11:06:55 INFO MemoryStore: Block broadcast_5 stored as values in 
> memory (estimated size 5.0 KB, free 265.4 MB)
> 15/06/13 11:06:55 INFO MemoryStore: ensureFreeSpace(3338) called with 
> curMem=28037, maxMem=278302556
> 15/06/13 11:06:55 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes 
> in memory (estimated size 3.3 KB, free 265.4 MB)
> 15/06/13 11:06:55 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory 
> on localhost:56872 (size: 3.3 KB, free: 265.4 MB)
> 15/06/13 11:06:55 INFO SparkContext: Created broadcast 5 from broadcast at 
> DAGScheduler.scala:874
> 15/06/13 11:06:55 INFO DAGScheduler: Submitting 3 missing tasks from 
> ResultStage 5 (PythonRDD[6] at RDD at PythonRDD.scala:43)
> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Adding task set 5.0 with 3 tasks
> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 
> 13, localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 
> 14, localhost, PROCESS_LOCAL, 1665 bytes)
> 15/06/13 11:06:55 INFO TaskSetManager: Starting task 2.0 in stage 5.0 (TID 
> 15, localhost, PROCESS_LOCAL, 1708 bytes)
> 15/06/13 11:06:55 INFO Executor: Running task 0.0 in stage 5.0 (TID 13)
> 15/06/13 11:06:55 INFO Executor: Running task 1.0 in stage 5.0 (TID 14)
> 15/06/13 11:06:55 INFO Executor: Running task 2.0 in stage 5.0 (TID 15)
> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 1, boot = -24, init = 25, 
> finish = 0
> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 1, boot = -24, init = 25, 
> finish = 0
> 15/06/13 11:06:55 INFO Executor: Finished task 0.0 in stage 5.0 (TID 13). 666 
> bytes result sent to driver
> 15/06/13 11:06:55 INFO Executor: Finished task 2.0 in stage 5.0 (TID 15). 716 
> bytes result sent to driver
> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 
> 13) in 12 ms on localhost (1/3)
> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 
> 15) in 11 ms on localhost (2/3)
> 15/06/13 11:06:55 INFO PythonRDD: Times: total = 21, boot = 20, init = 1, 
> finish = 0
> 15/06/13 11:06:55 INFO Executor: Finished task 1.0 in stage 5.0 (TID 14). 666 
> bytes result sent to driver
> 15/06/13 11:06:55 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 
> 14) in 36 ms on localhost (3/3)
> 15/06/13 11:06:55 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks 
> have all completed, from pool
> 15/06/13 11:06:55 INFO DAGScheduler: ResultStage 5 (runJob at 
> PythonRDD.scala:366) finished in 0.038 s
> 15/06/13 11:06:55 INFO DAGScheduler: Job 5 finished: runJob at 
> PythonRDD.scala:366, took 0.052978 s
> 15/06/13 11:06:56 INFO HiveContext: Initializing execution hive, version 
> 0.13.1
> 15/06/13 11:06:56 INFO HiveMetaStore: 0: Opening raw store with implemenation 
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 15/06/13 11:06:56 INFO ObjectStore: ObjectStore, initialize called
> 15/06/13 11:06:57 INFO Persistence: Property datanucleus.cache.level2 unknown 
> - will be ignored
> 15/06/13 11:06:57 INFO Persistence: Property 
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 15/06/13 11:06:57 WARN Connection: BoneCP specified but not present in 
> CLASSPATH (or one of dependencies)
> 15/06/13 11:06:58 WARN Connection: BoneCP specified but not present in 
> CLASSPATH (or one of dependencies)
> 15/06/13 11:06:59 INFO BlockManagerInfo: Removed broadcast_5_piece0 on 
> localhost:56872 in memory (size: 3.3 KB, free: 265.4 MB)
> 15/06/13 11:06:59 INFO BlockManagerInfo: Removed broadcast_4_piece0 on 
> localhost:56872 in memory (size: 3.3 KB, free: 265.4 MB)
> 15/06/13 11:06:59 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
> localhost:56872 in memory (size: 3.3 KB, free: 265.4 MB)
> 15/06/13 11:06:59 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 
> localhost:56872 in memory (size: 2.3 KB, free: 265.4 MB)
> 15/06/13 11:07:00 INFO ObjectStore: Setting MetaStore object pin classes with 
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 15/06/13 11:07:00 INFO MetaStoreDirectSql: MySQL check failed, assuming we 
> are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), 
> after : "".
> 15/06/13 11:07:01 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 15/06/13 11:07:01 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 15/06/13 11:07:04 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 15/06/13 11:07:04 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 15/06/13 11:07:04 INFO ObjectStore: Initialized ObjectStore
> 15/06/13 11:07:04 WARN ObjectStore: Version information not found in 
> metastore. hive.metastore.schema.verification is not enabled so recording the 
> schema version 0.13.1aa
> 15/06/13 11:07:05 INFO HiveMetaStore: Added admin role in metastore
> 15/06/13 11:07:05 INFO HiveMetaStore: Added public role in metastore
> 15/06/13 11:07:05 INFO HiveMetaStore: No user is added in admin role, since 
> config is empty
> 15/06/13 11:07:05 INFO SessionState: No Tez session required at this point. 
> hive.execution.engine=mr.
> 15/06/13 11:07:06 INFO HiveContext: Initializing HiveMetastoreConnection 
> version 0.13.1 using Spark classes.
> 15/06/13 11:07:07 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 15/06/13 11:07:08 INFO HiveMetaStore: 0: Opening raw store with implemenation 
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 15/06/13 11:07:08 INFO ObjectStore: ObjectStore, initialize called
> 15/06/13 11:07:08 INFO Persistence: Property datanucleus.cache.level2 unknown 
> - will be ignored
> 15/06/13 11:07:08 INFO Persistence: Property 
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 15/06/13 11:07:08 WARN Connection: BoneCP specified but not present in 
> CLASSPATH (or one of dependencies)
> 15/06/13 11:07:09 WARN Connection: BoneCP specified but not present in 
> CLASSPATH (or one of dependencies)
> 15/06/13 11:07:11 INFO ObjectStore: Setting MetaStore object pin classes with 
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 15/06/13 11:07:11 INFO MetaStoreDirectSql: MySQL check failed, assuming we 
> are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), 
> after : "".
> 15/06/13 11:07:12 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 15/06/13 11:07:12 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 15/06/13 11:07:13 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 15/06/13 11:07:13 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 15/06/13 11:07:14 INFO ObjectStore: Initialized ObjectStore
> 15/06/13 11:07:14 WARN ObjectStore: Version information not found in 
> metastore. hive.metastore.schema.verification is not enabled so recording the 
> schema version 0.13.1aa
> 15/06/13 11:07:14 INFO HiveMetaStore: Added admin role in metastore
> 15/06/13 11:07:15 INFO HiveMetaStore: Added public role in metastore
> 15/06/13 11:07:15 INFO HiveMetaStore: No user is added in admin role, since 
> config is empty
> 15/06/13 11:07:15 INFO SessionState: No Tez session required at this point. 
> hive.execution.engine=mr.
> >>>
> >>> df.save("/tmp/d.csv", "com.databricks.spark.csv")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File 
> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/dataframe.py",
>  line 202, in save
>     return self.write.save(path, source, mode, **options)
>   File 
> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py",
>  line 295, in save
>     self._jwrite.save(path)
>   File 
> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o86.save.
> : java.lang.RuntimeException: Failed to load class for data source: 
> com.databricks.spark.csv
>       at scala.sys.package$.error(package.scala:27)
>       at 
> org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:216)
>       at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:302)
>       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
>       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>       at py4j.Gateway.invoke(Gateway.java:259)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at py4j.GatewayConnection.run(GatewayConnection.java:207)
>       at java.lang.Thread.run(Thread.java:744)
> >>>
> ...
> >>> df.write.format("com.databricks.spark.csv").save("/tmp/d.csv")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File 
> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/readwriter.py",
>  line 295, in save
>     self._jwrite.save(path)
>   File 
> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/Users/drake/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o90.save.
> : java.lang.RuntimeException: Failed to load class for data source: 
> com.databricks.spark.csv
>       at scala.sys.package$.error(package.scala:27)
>       at 
> org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:216)
>       at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:302)
>       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
>       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>       at py4j.Gateway.invoke(Gateway.java:259)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at py4j.GatewayConnection.run(GatewayConnection.java:207)
>       at java.lang.Thread.run(Thread.java:744)
> >>>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8365) pyspark does not retain --packages or --jars passed on the command line as of 1.4.0

Reply via email to