Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread Muthu Jayakumar
Hello Kant,

>I still don't understand How SparkSession can use Akka to communicate with
SparkCluster?
Let me use your initial requirement as a way to illustrate what I mean --
i.e, "I want my Micro service app to be able to query and access data on
HDFS"
In order to run a query say a DF query (equally possible with SQL as well),
you'll need a sparkSession to build a query right? If you can have your
main thread launched in client-mode (
https://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications)
then you'll be able to use play/akka based microservice as you used to.
Here is what I have in one of my applications do...
a. I have an akka-http as a micro-service that takes a query-like JSON
request (based on simple scala parser combinator) and runs a spark job
using dataframe/dataset and sends back JSON responses (synchronous and
asynchronously).
b. have another akka-actor that takes an object request to generate
parquet(s)
c. Another akka-http endpoint (based on web-sockets) to perform similar
operation as (a)
d. Another akka-http end-point to get progress on a running query / parquet
generation (which is based on SparkContext / SparkSQL internal API which is
similar to https://spark.apache.org/docs/latest/monitoring.html)
The idea is to make sure to have only one sparkSession per JVM. But you can
set the execution to be in FAIR (which defaults to FIFO) to be able to run
multiple queries in parallel. The application I use runs spark in Spark
Standalone with a 32 node cluster.

Hope this gives some better idea.

Thanks,
Muthu


On Sun, Jun 4, 2017 at 10:33 PM, kant kodali  wrote:

> Hi Muthu,
>
> I am actually using Play framework for my Micro service which uses Akka
> but I still don't understand How SparkSession can use Akka to communicate
> with SparkCluster? SparkPi or SparkPl? any link?
>
> Thanks!
>


Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread kant kodali
Hi Muthu,

I am actually using Play framework for my Micro service which uses Akka but
I still don't understand How SparkSession can use Akka to communicate with
SparkCluster? SparkPi or SparkPl? any link?

Thanks!


Re: SparkAppHandle.Listener.infoChanged behaviour

2017-06-04 Thread Marcelo Vanzin
On Sat, Jun 3, 2017 at 7:16 PM, Mohammad Tariq  wrote:
> I am having a bit of difficulty in understanding the exact behaviour of
> SparkAppHandle.Listener.infoChanged(SparkAppHandle handle) method. The
> documentation says :
>
> Callback for changes in any information that is not the handle's state.
>
> What exactly is meant by any information here? Apart from state other pieces
> of information I can see is ID

So, you answered your own question.

If there's ever any new kind of information, it would use the same event.

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread Muthu Jayakumar
One drastic suggestion can be to write a simple microservice using Akka and
create a SparkSession (during the start of vm) and pass it around. You can
look at SparkPI for sample source code to start writing your microservice.
In my case, I used akka http to wrap my business requests and transform
them to read Parquet and respond back results.
Hope this helps

Thanks
Muthu


On Mon, Jun 5, 2017, 01:01 Sandeep Nemuri  wrote:

> Well if you are using Hortonworks distribution there is Livy2 which is
> compatible with Spark2 and scala 2.11.
>
>
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_command-line-installation/content/install_configure_livy2.html
>
>
> On Sun, Jun 4, 2017 at 1:55 PM, kant kodali  wrote:
>
>> Hi,
>>
>> Thanks for this but here is what the documentation says:
>>
>> "To run the Livy server, you will also need an Apache Spark
>> installation. You can get Spark releases at
>> https://spark.apache.org/downloads.html. Livy requires at least Spark
>> 1.4 and currently only supports Scala 2.10 builds of Spark. To run Livy
>> with local sessions, first export these variables:"
>>
>> I am using spark 2.1.1 and scala 2.11.8 and I would like to use
>> Dataframes and Dataset API so it sounds like this is not an option for me?
>>
>> Thanks!
>>
>> On Sun, Jun 4, 2017 at 12:23 AM, Sandeep Nemuri 
>> wrote:
>>
>>> Check out http://livy.io/
>>>
>>>
>>> On Sun, Jun 4, 2017 at 11:59 AM, kant kodali  wrote:
>>>
 Hi All,

 I am wondering what is the easiest way for a Micro service to query
 data on HDFS? By easiest way I mean using minimal number of tools.

 Currently I use spark structured streaming to do some real time
 aggregations and store it in HDFS. But now, I want my Micro service app to
 be able to query and access data on HDFS. It looks like SparkSession can
 only be accessed through CLI but not through a JDBC like API or whatever.
 Any suggestions?

 Thanks!

>>>
>>>
>>>
>>> --
>>> *  Regards*
>>> *  Sandeep Nemuri*
>>>
>>
>>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>


Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread Sandeep Nemuri
Well if you are using Hortonworks distribution there is Livy2 which is
compatible with Spark2 and scala 2.11.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_command-line-installation/content/install_configure_livy2.html


On Sun, Jun 4, 2017 at 1:55 PM, kant kodali  wrote:

> Hi,
>
> Thanks for this but here is what the documentation says:
>
> "To run the Livy server, you will also need an Apache Spark installation.
> You can get Spark releases at https://spark.apache.org/downloads.html.
> Livy requires at least Spark 1.4 and currently only supports Scala 2.10
> builds of Spark. To run Livy with local sessions, first export these
> variables:"
>
> I am using spark 2.1.1 and scala 2.11.8 and I would like to use Dataframes
> and Dataset API so it sounds like this is not an option for me?
>
> Thanks!
>
> On Sun, Jun 4, 2017 at 12:23 AM, Sandeep Nemuri 
> wrote:
>
>> Check out http://livy.io/
>>
>>
>> On Sun, Jun 4, 2017 at 11:59 AM, kant kodali  wrote:
>>
>>> Hi All,
>>>
>>> I am wondering what is the easiest way for a Micro service to query data
>>> on HDFS? By easiest way I mean using minimal number of tools.
>>>
>>> Currently I use spark structured streaming to do some real time
>>> aggregations and store it in HDFS. But now, I want my Micro service app to
>>> be able to query and access data on HDFS. It looks like SparkSession can
>>> only be accessed through CLI but not through a JDBC like API or whatever.
>>> Any suggestions?
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> *  Regards*
>> *  Sandeep Nemuri*
>>
>
>


-- 
*  Regards*
*  Sandeep Nemuri*


Re: Is there a way to do conditional group by in spark 2.1.1?

2017-06-04 Thread Guy Cohen
Try this one:

df.groupBy(
  when(expr("field1='foo'"),"field1").when(expr("field2='bar'"),"field2"))


On Sun, Jun 4, 2017 at 3:16 AM, Bryan Jeffrey 
wrote:

> You should be able to project a new column that is your group column. Then
> you can group on the projected column.
>
> Get Outlook for Android 
>
>
>
>
> On Sat, Jun 3, 2017 at 6:26 PM -0400, "upendra 1991" <
> upendra1...@yahoo.com.invalid> wrote:
>
> Use a function
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Sat, Jun 3, 2017 at 5:01 PM, kant kodali
>>  wrote:
>> Hi All,
>>
>> Is there a way to do conditional group by in spark 2.1.1? other words, I
>> want to do something like this
>>
>> if (field1 == "foo") {
>>df.groupBy(field1)
>> } else if (field2 == "bar")
>>   df.groupBy(field2)
>>
>> Thanks
>>
>>


Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread khwunchai jaengsawang


Hi Abdulfattah,

Make sure you have enough resource available when submit the application, it 
seems like Spark is waiting to have enough resource.


Best,

  Khwunchai Jaengsawang
  Email: khwuncha...@ku.th
  Mobile: +66 88 228 1715
  LinkedIn  | Github 



> On Jun 4, 2560 BE, at 6:51 PM, Abdulfattah Safa  wrote:
> 
> I'm working on Spark with Standalone Cluster mode. I need to increase the 
> Driver Memory as I got OOM in t he driver thread. If found that when setting  
> the Driver Memory to > Executor Memory, the submitted job is stuck at 
> Submitted in the driver and the application never starts. 



Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread Abdulfattah Safa
I'm working on Spark with Standalone Cluster mode. I need to increase the
Driver Memory as I got OOM in t he driver thread. If found that when
setting  the Driver Memory to > Executor Memory, the submitted job is stuck
at Submitted in the driver and the application never starts.


Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread Abdulfattah Safa
I'm working on Spark with Standalone Cluster mode. I need to increase the
Driver Memory as I got OOM in t he driver thread. If found that when
setting  the Driver Memory to > Executor Memory, the submitted job is stuck
at Submitted in the driver and the application never starts.


Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread kant kodali
Hi,

Thanks for this but here is what the documentation says:

"To run the Livy server, you will also need an Apache Spark installation.
You can get Spark releases at https://spark.apache.org/downloads.html. Livy
requires at least Spark 1.4 and currently only supports Scala 2.10 builds
of Spark. To run Livy with local sessions, first export these variables:"

I am using spark 2.1.1 and scala 2.11.8 and I would like to use Dataframes
and Dataset API so it sounds like this is not an option for me?

Thanks!

On Sun, Jun 4, 2017 at 12:23 AM, Sandeep Nemuri 
wrote:

> Check out http://livy.io/
>
>
> On Sun, Jun 4, 2017 at 11:59 AM, kant kodali  wrote:
>
>> Hi All,
>>
>> I am wondering what is the easiest way for a Micro service to query data
>> on HDFS? By easiest way I mean using minimal number of tools.
>>
>> Currently I use spark structured streaming to do some real time
>> aggregations and store it in HDFS. But now, I want my Micro service app to
>> be able to query and access data on HDFS. It looks like SparkSession can
>> only be accessed through CLI but not through a JDBC like API or whatever.
>> Any suggestions?
>>
>> Thanks!
>>
>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>


Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread Sandeep Nemuri
Check out http://livy.io/


On Sun, Jun 4, 2017 at 11:59 AM, kant kodali  wrote:

> Hi All,
>
> I am wondering what is the easiest way for a Micro service to query data
> on HDFS? By easiest way I mean using minimal number of tools.
>
> Currently I use spark structured streaming to do some real time
> aggregations and store it in HDFS. But now, I want my Micro service app to
> be able to query and access data on HDFS. It looks like SparkSession can
> only be accessed through CLI but not through a JDBC like API or whatever.
> Any suggestions?
>
> Thanks!
>



-- 
*  Regards*
*  Sandeep Nemuri*