Re: Does spark supports the Hive function posexplode function?
It seems this feature was added in Hive 0.13. https://issues.apache.org/jira/browse/HIVE-4943 I would assume this is supported as Spark is by default compiled using Hive 0.13.1. On Sun, Jul 12, 2015 at 7:42 PM, Ruslan Dautkhanov dautkha...@gmail.com wrote: You can see what Spark SQL functions are supported in Spark by doing the following in a notebook: %sql show functions https://forums.databricks.com/questions/665/is-hive-coalesce-function-supported-in-sparksql.html I think Spark SQL support is currently around Hive ~0.11? -- Ruslan Dautkhanov On Tue, Jul 7, 2015 at 3:10 PM, Jeff J Li l...@us.ibm.com wrote: I am trying to use the posexplode function in the HiveContext to auto-generate a sequence number. This feature is supposed to be available Hive 0.13.0. SELECT name, phone FROM contact LATERAL VIEW posexplode(phoneList.phoneNumber) phoneTable AS pos, phone My test program failed with the following java.lang.ClassNotFoundException: posexplode at java.net.URLClassLoader.findClass(URLClassLoader.java:665) at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:942) at java.lang.ClassLoader.loadClass(ClassLoader.java:851) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:827) at org.apache.spark.sql.hive.HiveFunctionWrapper.createFunction(Shim13.scala:147) at org.apache.spark.sql.hive.HiveGenericUdtf.function$lzycompute(hiveUdfs.scala:274) at org.apache.spark.sql.hive.HiveGenericUdtf.function(hiveUdfs.scala:274) Does spark support this Hive function posexplode? If not, how to patch it to support this? I am on Spark 1.3.1 Thanks, Jeff Li
Re: How to upgrade Spark version in CDH 5.4
As Sean suggested you can actually build Spark 1.4 for CDH 5.4.x and also include Hive libraries for 0.13.1, but *this will be completely unsupported by Cloudera*. I would suggest to do that only if you just want to experiment with new features from Spark 1.4. I.e. Run SparkSQL with sort-merge join to run expensive BI types of queries over a Hadoop YARN cluster. You just need to keep the build in your user folder across the cluster, then include the Path for the binaries (spark-shell, spark-subimt, pyspark,etc...) in your user profile and make sure you include CDH Classpath to load all dependencies. On Sun, Jul 12, 2015 at 8:19 PM, Sean Owen so...@cloudera.com wrote: Yeah, it won't technically be supported, and you shouldn't go modifying the actual installation, but if you just make your own build of 1.4 for CDH 5.4 and use that build to launch YARN-based apps, I imagine it will Just Work for most any use case. On Sun, Jul 12, 2015 at 7:34 PM, Ruslan Dautkhanov dautkha...@gmail.com wrote: Good question. I'd like to know the same. Although I think you'll loose supportability. -- Ruslan Dautkhanov On Wed, Jul 8, 2015 at 2:03 AM, Ashish Dutt ashish.du...@gmail.com wrote: Hi, I need to upgrade spark version 1.3 to version 1.4 on CDH 5.4. I checked the documentation here but I do not see any thing relevant Any suggestions directing to a solution are welcome. Thanks, Ashish - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
SparkSQL cache table with multiple replicas
Hi all, Do you know if there is an option to specify how many replicas we want while caching in memory a table in SparkSQL Thrift server? I have not seen any option so far but I assumed there is an option as you can see in the Storage section of the UI that there is 1 x replica of your Dataframe/Table... I believe there can be a good use case on where you want to replicate a dimension table across your nodes to improve response times when running typical BI DWH types of queries (Just to avoid having to broadcast data every time and again). Do you think that would be a good addition to SparkSQL? Regards.
Issues when saving dataframe in Spark 1.4 with parquet format
Hi chaps, It seems there is an issue while saving dataframes in Spark 1.4. The default file extension inside Hive warehouse folder is now part-r-X.gz.parquet but while running queries from SparkSQL Thriftserver is still looking for part-r-X.parquet. Is there any config parameter we can use as workaround? Is there any Jira opened about the same? Am I missing anything if I upgraded from Spark 1.3 to 1.4? The only similar reference I have seen is this: http://mail-archives.apache.org/mod_mbox/spark-user/201506.mbox/%3ccahp0wa+japfvj+pc2mzwomzb+mmdozfbr-xaxdbkoppe68t...@mail.gmail.com%3E Thanks.