Jeff Zhang created SPARK-11205:
----------------------------------

             Summary: Delegate to scala DataFrame API rather than print in 
python
                 Key: SPARK-11205
                 URL: https://issues.apache.org/jira/browse/SPARK-11205
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 1.5.1
            Reporter: Jeff Zhang
            Priority: Minor


When I use DataFrame#explain(), I found the output is a little different from 
scala API. Here's one example.
{noformat}
== Physical Plan ==    // this line is removed in pyspark API
Scan 
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json][age#0L,name#1]
{noformat}

After looking at the code, I found that pyspark will print the output by itself 
rather than delegate it to spark-sql. This cause the difference between scala 
api and python api. I think both python api and scala api try to print it to 
standard out, so the python api can be deleted to scala api. Here's some api I 
found that can be delegated to scala api directly:
* printSchema()
* explain()
* show()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to