Re: Data source aliasing

2015-07-30 Thread Mridul Muralidharan
Would be a good idea to generalize this for spark core - and allow for its use in serde, compression, etc. Regards, Mridul On Thu, Jul 30, 2015 at 11:33 AM, Joseph Batchik josephbatc...@gmail.com wrote: Yep I was looking into using the jar service loader. I pushed a rough draft to my fork of

Data source aliasing

2015-07-30 Thread Joseph Batchik
Hi all, There are now starting to be a lot of data source packages for Spark. A annoyance I see is that I have to type in the full class name like: sqlContext.read.format(com.databricks.spark.avro).load(path). Spark internally has formats such as parquet and jdbc registered and it would be nice

Re: Data source aliasing

2015-07-30 Thread Patrick Wendell
Yeah this could make sense - allowing data sources to register a short name. What mechanism did you have in mind? To use the jar service loader? The only issue is that there could be conflicts since many of these are third party packages. If the same name were registered twice I'm not sure what

Re: Data source aliasing

2015-07-30 Thread Michael Armbrust
+1 On Thu, Jul 30, 2015 at 11:18 AM, Patrick Wendell pwend...@gmail.com wrote: Yeah this could make sense - allowing data sources to register a short name. What mechanism did you have in mind? To use the jar service loader? The only issue is that there could be conflicts since many of these

Re: Data source aliasing

2015-07-30 Thread Joseph Batchik
Yep I was looking into using the jar service loader. I pushed a rough draft to my fork of Spark: https://github.com/JDrit/spark/commit/946186e3f17ddcc54acf2be1a34aebf246b06d2f Right now it will use the first alias it finds, but I can change that to check them all and report an error if it finds