Re: UDTF registration fails for hiveEnabled SQLContext

2018-05-15 Thread Mick Davies
I am trying to register a UDTF not a UDF. So I don't think this applies Mick -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: UDTF registration fails for hiveEnabled SQLContext

2018-05-15 Thread Mick Davies
Hi Gourav, I don't think you can register UDTFs via sparkSession.udf.register Mick -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: UDTF registration fails for hiveEnabled SQLContext

2018-05-14 Thread Mick Davies
The examples were lost by formatting: Exception is: No handler for UDAF 'com.iqvia.rwas.omop.udtf.ParallelExplode'. Use sparkSession.udf.register(...) instead.; line 1 pos 7 org.apache.spark.sql.AnalysisException: No handler for UDAF 'com.iqvia.rwas.omop.udtf.ParallelExplode'. Use

UDTF registration fails for hiveEnabled SQLContext

2018-05-11 Thread Mick Davies
Hi, If I try to register a UDTF using SQLContext ( with enableHiveSupport set) using the code: I get the following error: It works OK if I use deprecated HiveContext. Is there a way to register UDTF without using deprecated code? This is happening in some tests I am writing using but I

Custom order by in Spark SQL

2015-07-01 Thread Mick Davies
Hi, Is there a way to specify a custom order by (Ordering) on a column in Spark SQL In particular I would like to have the order by applied to a currency column not to be alpha, but something like - USD, EUR, JPY, GBP etc.. I saw an earlier post on UDTs and ordering (which I can't seem to

Setting maxPrintString in Spark Repl to view SQL query plans

2015-02-03 Thread Mick Davies
Hi, I want to increase the maxPrintString the Spark repl to look at SQL query plans, as they are truncated by default at 800 chars, but don't know how to set this. You don't seem to be able to do it in the same way as you would with with Scala repl. Anyone know how to set this? Also anyone

Re: Spark SQL Parquet - data are reading very very slow

2015-02-01 Thread Mick Davies
Dictionary encoding of Strings from Parquet now added and will be in 1.3. This should reduce UTF to String decoding significantly https://issues.apache.org/jira/browse/SPARK-5309 -- View this message in context:

[SQL] Using HashPartitioner to distribute by column

2015-01-19 Thread Mick Davies
Is it possible to use a HashPartioner or something similar to distribute a SchemaRDDs data by the hash of a particular column or set of columns. Having done this I would then hope that GROUP BY could avoid shuffle E.g. set up a HashPartioner on CustomerCode field so that SELECT CustomerCode,

Re: Spark SQL Parquet - data are reading very very slow

2015-01-19 Thread Mick Davies
Added a JIRA to track https://issues.apache.org/jira/browse/SPARK-5309 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Parquet-data-are-reading-very-very-slow-tp21061p21229.html Sent from the Apache Spark User List mailing list archive at

Re: Spark SQL Parquet - data are reading very very slow

2015-01-16 Thread Mick Davies
I have seen similar results. I have looked at the code and I think there are a couple of contributors: Encoding/decoding java Strings to UTF8 bytes is quite expensive. I'm not sure what you can do about that. But there are options for optimization due to the repeated decoding of the same String