Re: Spark SQL with a sorted file

Jerry Raj Mon, 22 Dec 2014 23:56:07 -0800

Michael,

Thanks. Is this still turned off in the released 1.2? Is it possible toturn it on just to get an idea of how much of a difference it makes?


-Jerry

On 05/12/14 12:40 am, Michael Armbrust wrote:

I'll add that some of our data formats will actual infer this sort of
useful information automatically.  Both parquet and cached inmemory
tables keep statistics on the min/max value for each column.  When you
have predicates over these sorted columns, partitions will be eliminated
if they can't possibly match the predicate given the statistics.

For parquet this is new in Spark 1.2 and it is turned off by defaults
(due to bugs we are working with the parquet library team to fix).
Hopefully soon it will be on by default.

On Wed, Dec 3, 2014 at 8:44 PM, Cheng, Hao <hao.ch...@intel.com
<mailto:hao.ch...@intel.com>> wrote:

    You can try to write your own Relation with filter push down or use
    the ParquetRelation2 for workaround.
    
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala)

    Cheng Hao

    -----Original Message-----
    From: Jerry Raj [mailto:jerry....@gmail.com
    <mailto:jerry....@gmail.com>]
    Sent: Thursday, December 4, 2014 11:34 AM
    To: user@spark.apache.org <mailto:user@spark.apache.org>
    Subject: Spark SQL with a sorted file

    Hi,
    If I create a SchemaRDD from a file that I know is sorted on a
    certain field, is it possible to somehow pass that information on to
    Spark SQL so that SQL queries referencing that field are optimized?

    Thanks
    -Jerry

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org> For additional commands,
    e-mail: user-h...@spark.apache.org <mailto:user-h...@spark.apache.org>


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark SQL with a sorted file

Reply via email to