Re: cannot call explain or show on dataframe in structured streaming addBatch dataframe

Michael Armbrust Mon, 19 Jun 2017 12:09:12 -0700

There is a little bit of weirdness to how we override the default query
planner to replace it with an incrementalizing planner.  As such, calling
any operation that changes the query plan (such as a LIMIT) would cause it
to revert to the batch planner and return the wrong answer.  We should fix
this before the finalize the Sink API.


On Mon, Jun 19, 2017 at 9:32 AM, assaf.mendelson <assaf.mendel...@rsa.com>
wrote:

> Hi all,
>
> I am playing around with structured streaming and looked at the code for
> ConsoleSink.
>
>
>
> I see the code has:
>
>
>
> data.sparkSession.createDataFrame(
>     data.sparkSession.sparkContext.parallelize(data.collect()), data.schema)
>     .show(*numRowsToShow*, *isTruncated*)
> }
>
>
>
> I was wondering why it does not do data directly? Why the collect and
> parallelize?
>
>
>
>
>
> Thanks,
>
>               Assaf.
>
>
>
> ------------------------------
> View this message in context: cannot call explain or show on dataframe in
> structured streaming addBatch dataframe
> <http://apache-spark-developers-list.1001551.n3.nabble.com/cannot-call-explain-or-show-on-dataframe-in-structured-streaming-addBatch-dataframe-tp21792.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>

Re: cannot call explain or show on dataframe in structured streaming addBatch dataframe

Reply via email to