Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread kant kodali
Hi Muthu, I am actually using Play framework for my Micro service which uses Akka but I still don't understand How SparkSession can use Akka to communicate with SparkCluster? SparkPi or SparkPl? any link? Thanks!

Re: SparkAppHandle.Listener.infoChanged behaviour

2017-06-04 Thread Marcelo Vanzin
On Sat, Jun 3, 2017 at 7:16 PM, Mohammad Tariq wrote: > I am having a bit of difficulty in understanding the exact behaviour of > SparkAppHandle.Listener.infoChanged(SparkAppHandle handle) method. The > documentation says : > > Callback for changes in any information that is

Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread Muthu Jayakumar
One drastic suggestion can be to write a simple microservice using Akka and create a SparkSession (during the start of vm) and pass it around. You can look at SparkPI for sample source code to start writing your microservice. In my case, I used akka http to wrap my business requests and transform

Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread Sandeep Nemuri
Well if you are using Hortonworks distribution there is Livy2 which is compatible with Spark2 and scala 2.11. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_command-line-installation/content/install_configure_livy2.html On Sun, Jun 4, 2017 at 1:55 PM, kant kodali

Re: Is there a way to do conditional group by in spark 2.1.1?

2017-06-04 Thread Guy Cohen
Try this one: df.groupBy( when(expr("field1='foo'"),"field1").when(expr("field2='bar'"),"field2")) On Sun, Jun 4, 2017 at 3:16 AM, Bryan Jeffrey wrote: > You should be able to project a new column that is your group column. Then > you can group on the projected

Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread khwunchai jaengsawang
Hi Abdulfattah, Make sure you have enough resource available when submit the application, it seems like Spark is waiting to have enough resource. Best, Khwunchai Jaengsawang Email: khwuncha...@ku.th Mobile: +66 88 228 1715 LinkedIn | Github

Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread Abdulfattah Safa
I'm working on Spark with Standalone Cluster mode. I need to increase the Driver Memory as I got OOM in t he driver thread. If found that when setting the Driver Memory to > Executor Memory, the submitted job is stuck at Submitted in the driver and the application never starts.

Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread Abdulfattah Safa
I'm working on Spark with Standalone Cluster mode. I need to increase the Driver Memory as I got OOM in t he driver thread. If found that when setting the Driver Memory to > Executor Memory, the submitted job is stuck at Submitted in the driver and the application never starts.

Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread kant kodali
Hi, Thanks for this but here is what the documentation says: "To run the Livy server, you will also need an Apache Spark installation. You can get Spark releases at https://spark.apache.org/downloads.html. Livy requires at least Spark 1.4 and currently only supports Scala 2.10 builds of Spark.

Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread Sandeep Nemuri
Check out http://livy.io/ On Sun, Jun 4, 2017 at 11:59 AM, kant kodali wrote: > Hi All, > > I am wondering what is the easiest way for a Micro service to query data > on HDFS? By easiest way I mean using minimal number of tools. > > Currently I use spark structured

What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread kant kodali
Hi All, I am wondering what is the easiest way for a Micro service to query data on HDFS? By easiest way I mean using minimal number of tools. Currently I use spark structured streaming to do some real time aggregations and store it in HDFS. But now, I want my Micro service app to be able to