Spark-SQL 1.6.2 w/Hive UDF @Description

2016-12-23 Thread Lavelle, Shawn
​Hello Spark Users, I have a Hive UDF that I'm trying to use with Spark-SQL. It's showing up a bit awkwardly: I can load it into the Hive Thrift Server with a "Create function..." query against the hive context. I can then use the UDF in queries. However, a "desc function " says the

Re: Approach: Incremental data load from HBASE

2016-12-23 Thread Chetan Khatri
Ted Correct, In my case i want Incremental Import from HBASE and Incremental load to Hive. Both approach discussed earlier with Indexing seems accurate to me. But like Sqoop support Incremental import and load for RDBMS, Is there any tool which supports Incremental import from HBase ? On Wed,

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Chetan Khatri
Correct, so the approach you suggested and Uber Jar Approach. What i think that Uber Jar approach is best practice because if you wish to do environment migration then would be easy. and Performance wise also Uber Jar Approach would be more optimised rather than Uber less approach. Thanks. On

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
We remodel Spark dependencies and ours together and chuck them under the /jars path. There are other ways to do it but we want the classpath to be strictly as close to development as possible. --- Regards, Andy On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Chetan Khatri
Andy, Thanks for reply. If we download all the dependencies at separate location and link with spark job jar on spark cluster, is it best way to execute spark job ? Thanks. On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang wrote: > I used to use uber jar in Spark 1.x because of

Re: Can't access the data in Kafka Spark Streaming globally

2016-12-23 Thread Cody Koeninger
This doesn't sound like a question regarding Kafka streaming, it sounds like confusion about the scope of variables in spark generally. Is that right? If so, I'd suggest reading the documentation, starting with a simple rdd (e.g. using sparkContext.parallelize), and experimenting to confirm your

Re: Is there any scheduled release date for Spark 2.1.0?

2016-12-23 Thread Justin Miller
I'm curious about this as well. Seems like the vote passed. > On Dec 23, 2016, at 2:00 AM, Aseem Bansal wrote: > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
I used to use uber jar in Spark 1.x because of classpath issues (we couldn't re-model our dependencies based on our code, and thus cluster's run dependencies could be very different from running Spark directly in the IDE. We had to use userClasspathFirst "hack" to work around this. With Spark 2,

ThreadPoolExecutor - slow spark job

2016-12-23 Thread geoHeil
Hi, I built a spark job which is very slow. ThreadPoolExecutor is executed for every second task of my custom spark pipeline step. Additionally, I noticed that spark is spending a lot of the time in the garbage collection and sometimes 0 tasks are launched but still the driver is waiting I put

Dependency Injection and Microservice development with Spark

2016-12-23 Thread Chetan Khatri
Hello Community, Current approach I am using for Spark Job Development with Scala + SBT and Uber Jar with yml properties file to pass configuration parameters. But If i would like to use Dependency Injection and MicroService Development like Spring Boot feature in Scala then what would be the

Re: parsing embedded json in spark

2016-12-23 Thread Tal Grynbaum
Hi Shaw, Thanks, that works! On Thu, Dec 22, 2016 at 6:45 PM, Shaw Liu wrote: > Hi,I guess you can use 'get_json_object' function > > Get Outlook for iOS > > > > > On Thu, Dec 22, 2016 at 9:52 PM +0800, "Irving Duran" < > irving.du...@gmail.com>

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-23 Thread Anastasios Zouzias
Hi Rohit, Since your instances have 16G dual core only, I would suggest to use dedicated nodes for elastic using 8GB for elastic heap memory. This way you won't have any interference between spark executors and elastic. Also, if possible, you could try to use SSD disk on these 3 machines for

Is there any scheduled release date for Spark 2.1.0?

2016-12-23 Thread Aseem Bansal