Re: Write Spark Connection client application in Go

2023-09-14 Thread bo yang
Thanks Holden and Martin for the nice words and feedback :)

On Wed, Sep 13, 2023 at 8:22 AM Martin Grund  wrote:

> This is absolutely awesome! Thank you so much for dedicating your time to
> this project!
>
>
> On Wed, Sep 13, 2023 at 6:04 AM Holden Karau  wrote:
>
>> That’s so cool! Great work y’all :)
>>
>> On Tue, Sep 12, 2023 at 8:14 PM bo yang  wrote:
>>
>>> Hi Spark Friends,
>>>
>>> Anyone interested in using Golang to write Spark application? We created
>>> a Spark Connect Go Client library
>>> <https://github.com/apache/spark-connect-go>. Would love to hear
>>> feedback/thoughts from the community.
>>>
>>> Please see the quick start guide
>>> <https://github.com/apache/spark-connect-go/blob/master/quick-start.md>
>>> about how to use it. Following is a very short Spark Connect application in
>>> Go:
>>>
>>> func main() {
>>> spark, _ := 
>>> sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
>>> defer spark.Stop()
>>>
>>> df, _ := spark.Sql("select 'apple' as word, 123 as count union all 
>>> select 'orange' as word, 456 as count")
>>> df.Show(100, false)
>>> df.Collect()
>>>
>>> df.Write().Mode("overwrite").
>>> Format("parquet").
>>> Save("file:///tmp/spark-connect-write-example-output.parquet")
>>>
>>> df = spark.Read().Format("parquet").
>>> Load("file:///tmp/spark-connect-write-example-output.parquet")
>>> df.Show(100, false)
>>>
>>> df.CreateTempView("view1", true, false)
>>> df, _ = spark.Sql("select count, word from view1 order by count")
>>> }
>>>
>>>
>>> Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and
>>> working together on this repo! Welcome more people to contribute :)
>>>
>>> Best,
>>> Bo
>>>
>>>


Re: Write Spark Connection client application in Go

2023-09-13 Thread Martin Grund
This is absolutely awesome! Thank you so much for dedicating your time to
this project!


On Wed, Sep 13, 2023 at 6:04 AM Holden Karau  wrote:

> That’s so cool! Great work y’all :)
>
> On Tue, Sep 12, 2023 at 8:14 PM bo yang  wrote:
>
>> Hi Spark Friends,
>>
>> Anyone interested in using Golang to write Spark application? We created
>> a Spark Connect Go Client library
>> <https://github.com/apache/spark-connect-go>. Would love to hear
>> feedback/thoughts from the community.
>>
>> Please see the quick start guide
>> <https://github.com/apache/spark-connect-go/blob/master/quick-start.md>
>> about how to use it. Following is a very short Spark Connect application in
>> Go:
>>
>> func main() {
>>  spark, _ := 
>> sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
>>  defer spark.Stop()
>>
>>  df, _ := spark.Sql("select 'apple' as word, 123 as count union all 
>> select 'orange' as word, 456 as count")
>>  df.Show(100, false)
>>  df.Collect()
>>
>>  df.Write().Mode("overwrite").
>>  Format("parquet").
>>  Save("file:///tmp/spark-connect-write-example-output.parquet")
>>
>>  df = spark.Read().Format("parquet").
>>  Load("file:///tmp/spark-connect-write-example-output.parquet")
>>  df.Show(100, false)
>>
>>  df.CreateTempView("view1", true, false)
>>  df, _ = spark.Sql("select count, word from view1 order by count")
>> }
>>
>>
>> Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and
>> working together on this repo! Welcome more people to contribute :)
>>
>> Best,
>> Bo
>>
>>


Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau
That’s so cool! Great work y’all :)

On Tue, Sep 12, 2023 at 8:14 PM bo yang  wrote:

> Hi Spark Friends,
>
> Anyone interested in using Golang to write Spark application? We created a 
> Spark
> Connect Go Client library <https://github.com/apache/spark-connect-go>.
> Would love to hear feedback/thoughts from the community.
>
> Please see the quick start guide
> <https://github.com/apache/spark-connect-go/blob/master/quick-start.md>
> about how to use it. Following is a very short Spark Connect application in
> Go:
>
> func main() {
>   spark, _ := 
> sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
>   defer spark.Stop()
>
>   df, _ := spark.Sql("select 'apple' as word, 123 as count union all 
> select 'orange' as word, 456 as count")
>   df.Show(100, false)
>   df.Collect()
>
>   df.Write().Mode("overwrite").
>   Format("parquet").
>   Save("file:///tmp/spark-connect-write-example-output.parquet")
>
>   df = spark.Read().Format("parquet").
>   Load("file:///tmp/spark-connect-write-example-output.parquet")
>   df.Show(100, false)
>
>   df.CreateTempView("view1", true, false)
>   df, _ = spark.Sql("select count, word from view1 order by count")
> }
>
>
> Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and working
> together on this repo! Welcome more people to contribute :)
>
> Best,
> Bo
>
>


Write Spark Connection client application in Go

2023-09-12 Thread bo yang
Hi Spark Friends,

Anyone interested in using Golang to write Spark application? We
created a Spark
Connect Go Client library <https://github.com/apache/spark-connect-go>.
Would love to hear feedback/thoughts from the community.

Please see the quick start guide
<https://github.com/apache/spark-connect-go/blob/master/quick-start.md>
about how to use it. Following is a very short Spark Connect application in
Go:

func main() {
spark, _ := 
sql.SparkSession.Builder.Remote("sc://localhost:15002").Build()
defer spark.Stop()

df, _ := spark.Sql("select 'apple' as word, 123 as count union all
select 'orange' as word, 456 as count")
df.Show(100, false)
df.Collect()

df.Write().Mode("overwrite").
Format("parquet").
Save("file:///tmp/spark-connect-write-example-output.parquet")

df = spark.Read().Format("parquet").
Load("file:///tmp/spark-connect-write-example-output.parquet")
df.Show(100, false)

df.CreateTempView("view1", true, false)
df, _ = spark.Sql("select count, word from view1 order by count")
}


Many thanks to Martin, Hyukjin, Ruifeng and Denny for creating and working
together on this repo! Welcome more people to contribute :)

Best,
Bo


Re: [Spark] spark client for Hadoop 2.x

2022-04-06 Thread Morven Huang
I remember that ./dev/make-distribution.sh in spark source allows people to 
specify Hadoop version.

> 2022年4月6日 下午4:31,Amin Borjian  写道:
> 
> From Spark version 3.1.0 onwards, the clients provided for Spark are built 
> with Hadoop 3 and placed in maven Repository. Unfortunately  we use Hadoop 
> 2.7.7 in our infrastructure currently.
>  
> 1) Does Spark have a plan to publish the Spark client dependencies for Hadoop 
> 2.x?
> 2) Are the new Spark clients capable of connecting to the Hadoop 2.x cluster? 
> (According to a simple test, Spark client 3.2.1 had no problem with the 
> Hadoop 2.7 cluster but we wanted to know if there was any guarantee from 
> Spark?)
>  
> Thank you very much in advance
> Amin Borjian



[Spark] spark client for Hadoop 2.x

2022-04-06 Thread Amin Borjian
>From Spark version 3.1.0 onwards, the clients provided for Spark are built 
>with Hadoop 3 and placed in maven Repository. Unfortunately  we use Hadoop 
>2.7.7 in our infrastructure currently.

1) Does Spark have a plan to publish the Spark client dependencies for Hadoop 
2.x?
2) Are the new Spark clients capable of connecting to the Hadoop 2.x cluster? 
(According to a simple test, Spark client 3.2.1 had no problem with the Hadoop 
2.7 cluster but we wanted to know if there was any guarantee from Spark?)

Thank you very much in advance
Amin Borjian


Re: Spark standalone , client mode. How do I monitor?

2017-06-29 Thread Nirav Patel
you can use ganglia, ambari or nagios to monitor spark workers/masters.
Spark executors are resilient. There are may proprietary software companies
as well that just do hadoop application monitoring.



On Tue, Jun 27, 2017 at 5:03 PM, anna stax  wrote:

> Hi all,
>
> I have a spark standalone cluster. I am running a spark streaming
> application on it and the deploy mode is client. I am looking for the best
> way to monitor the cluster and application so that I will know when the
> application/cluster is down. I cannot move to cluster deploy mode now.
>
> I appreciate your thoughts.
>
> Thanks
> -Anna
>

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube] 



Spark standalone , client mode. How do I monitor?

2017-06-27 Thread anna stax
Hi all,

I have a spark standalone cluster. I am running a spark streaming
application on it and the deploy mode is client. I am looking for the best
way to monitor the cluster and application so that I will know when the
application/cluster is down. I cannot move to cluster deploy mode now.

I appreciate your thoughts.

Thanks
-Anna


spark streaming client program needs to be restarted after few hours of idle time. how can I fix it?

2016-10-18 Thread kant kodali
Hi Guys,

My Spark Streaming Client program works fine as the long as the receiver
receives the data  but say my receiver has no more data to receive for few
hours like (4-5 hours) and then its starts receiving the data again at that
point spark client program doesn't seem to process any data. It needs to be
restarted in which case everything seem to work fine again. I am using
spark standalone mode and my client program has following lines in the end
for it to run forever. any ideas what can go wrong? I have some potential
suspects and I will share them after a bit of experimentation from my end.

Thanks!

ssc.start();
ssc.awaitTermination();


Re: High virtual memory consumption on spark-submit client.

2016-05-13 Thread jone
no, i have set master to yarn-cluster.
when the sparkpi.running,the result of  free -t as follow
[running]mqq@10.205.3.29:/data/home/hive/conf$ free -t
 total   used   free shared    buffers cached
Mem:  32740732   32105684 635048  0 683332   28863456
-/+ buffers/cache:    2558896   30181836
Swap:  2088952  60320    2028632
Total:    34829684   32166004    2663680
after sparkpi succes,the result as follow
[running]mqq@10.205.3.29:/data/home/hive/conf$ free -t
 total   used   free shared    buffers cached
Mem:  32740732   31614452    1126280  0 683624   28863096
-/+ buffers/cache:    2067732   30673000
Swap:  2088952  60320    2028632
Total:    34829684   31674772    3154912
Mich Talebzadeh 
于 2016年5月13日,14:47写道:Is this a standalone set up single host where executor runs inside the driver?also runfree -tTo see the virtual memory usage which is basically swap spacefree -t total   used   free shared    buffers cachedMem:  24546308   24268760 277548  0    1088236   15168668-/+ buffers/cache:    8011856   16534452Swap:  2031608    304    2031304Total:    26577916   24269064    2308852

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 


On 13 May 2016 at 07:36, Jone Zhang  wrote:mich, Do you want this
==
[running]mqq@10.205.3.29:/data/home/hive/conf$ ps aux | grep SparkPi
mqq      20070  3.6  0.8 10445048 267028 pts/16 Sl+ 13:09   0:11
/data/home/jdk/bin/java
-Dlog4j.configuration=file:///data/home/spark/conf/log4j.properties
-cp /data/home/spark/lib/*:/data/home/hadoop/share/hadoop/common/*:/data/home/hadoop/share/hadoop/common/lib/*:/data/home/hadoop/share/hadoop/yarn/*:/data/home/hadoop/share/hadoop/yarn/lib/*:/data/home/hadoop/share/hadoop/hdfs/*:/data/home/hadoop/share/hadoop/hdfs/lib/*:/data/home/hadoop/share/hadoop/tools/*:/data/home/hadoop/share/hadoop/mapreduce/*:/data/home/spark/conf/:/data/home/spark/lib/spark-assembly-1.4.1-hadoop2.5.1_150903.jar:/data/home/spark/lib/datanucleus-api-jdo-3.2.6.jar:/data/home/spark/lib/datanucleus-core-3.2.10.jar:/data/home/spark/lib/datanucleus-rdbms-3.2.9.jar:/data/home/hadoop/conf/:/data/home/hadoop/conf/:/data/home/spark/lib/*:/data/home/hadoop/share/hadoop/common/*:/data/home/hadoop/share/hadoop/common/lib/*:/data/home/hadoop/share/hadoop/yarn/*:/data/home/hadoop/share/hadoop/yarn/lib/*:/data/home/hadoop/share/hadoop/hdfs/*:/data/home/hadoop/share/hadoop/hdfs/lib/*:/data/home/hadoop/share/hadoop/tools/*:/data/home/hadoop/share/hadoop/mapreduce/*
-XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master
yarn-cluster --class org.apache.spark.examples.SparkPi --queue spark
--num-executors 4
/data/home/spark/lib/spark-examples-1.4.1-hadoop2.5.1.jar 1
mqq      22410  0.0  0.0 110600  1004 pts/8    S+   13:14   0:00 grep SparkPi
[running]mqq@10.205.3.29:/data/home/hive/conf$ top -p 20070

top - 13:14:48 up 504 days, 19:17, 19 users,  load average: 1.41, 1.10, 0.99
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 18.1%us,  2.7%sy,  0.0%ni, 74.4%id,  4.5%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  32740732k total, 31606288k used,  113k free,   475908k buffers
Swap:  2088952k total,    61076k used,  2027876k free, 27594452k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20070 mqq       20   0 10.0g 260m  32m S  0.0  0.8   0:11.38 java
==

Harsh, physical cpu cores is 1, virtual cpu cores is 4

Thanks.

2016-05-13 13:08 GMT+08:00, Harsh J :
> How many CPU cores are on that machine? Read http://qr.ae/8Uv3Xq
>
> You can also confirm the above by running the pmap utility on your process
> and most of the virtual memory would be under 'anon'.
>
> On Fri, 13 May 2016 09:11 jone,  wrote:
>
>> The virtual memory is 9G When i run org.apache.spark.examples.SparkPi
>> under yarn-cluster model,which using default configurations.
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>> COMMAND
>>
>> 4519 mqq       20   0 9041 <2009041>m 248m  26m S  0.3  0.8   0:19.85
>> java
>>  I am curious why is so high?
>>
>> Thanks.
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Re: High virtual memory consumption on spark-submit client.

2016-05-12 Thread Harsh J
How many CPU cores are on that machine? Read http://qr.ae/8Uv3Xq

You can also confirm the above by running the pmap utility on your process
and most of the virtual memory would be under 'anon'.

On Fri, 13 May 2016 09:11 jone,  wrote:

> The virtual memory is 9G When i run org.apache.spark.examples.SparkPi
> under yarn-cluster model,which using default configurations.
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
>
> 4519 mqq   20   0 9041 <2009041>m 248m  26m S  0.3  0.8   0:19.85
> java
>  I am curious why is so high?
>
> Thanks.
>


Re: High virtual memory consumption on spark-submit client.

2016-05-12 Thread Mich Talebzadeh
can you please do the following:

jps|grep SparkSubmit|

and send the output of

ps aux|grep pid
top -p PID

and the output of

free

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 13 May 2016 at 04:40, jone  wrote:

> The virtual memory is 9G When i run org.apache.spark.examples.SparkPi
> under yarn-cluster model,which using default configurations.
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
>
> 4519 mqq   20   0 9041 <2009041>m 248m  26m S  0.3  0.8   0:19.85
> java
>  I am curious why is so high?
>
> Thanks.
>


High virtual memory consumption on spark-submit client.

2016-05-12 Thread jone
The virtual memory is 9G When i run org.apache.spark.examples.SparkPi under yarn-cluster model,which using default configurations.
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 4519 mqq   20   0 9041m 248m  26m S  0.3  0.8   0:19.85 java  
 I am curious why is so high?
Thanks.


Re: Spark for client

2016-03-01 Thread Todd Nist
You could also look at Apache Toree, http://toree.apache.org/
, github : https://github.com/apache/incubator-toree.  This use to be the
Spark Kernel from IBM but has been contributed to Apache.

Good overview here on its features,
http://www.spark.tc/how-to-enable-interactive-applications-against-apache-spark/.
Specifically this section on usage:

Usage

Using the kernel as the backbone of communication, we have enabled several
higher-level applications to interact with Apache Spark:

*->* Livesheets
<https://www.youtube.com/watch?v=2AX6g0tK-us=youtu.be=37m42s>, a
line of business tool for data exploration

*->* A RESTful query engine
<https://www.youtube.com/watch?v=2AX6g0tK-us=youtu.be=43m10s> running
on top of Spark SQL

*->* A demonstration of a PHP application utilizing Apache Spark
<https://www.youtube.com/watch?v=TD1J7MzYcFo=youtu.be=33m19s> at
ZendCon 2014

*->* IPython notebook
<https://www.youtube.com/watch?v=2AX6g0tK-us=youtu.be=37m42s> running
the Spark Kernel underneath
HTH.
Todd

On Tue, Mar 1, 2016 at 4:10 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks Mohannad.
>
> Installed Anaconda 3 that contains Jupyter. Now I want to access Spark on
> Scala from Jupyter. What is the easiest way of doing it without using
> Python!
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 1 March 2016 at 08:18, Mohannad Ali <man...@gmail.com> wrote:
>
>> Jupyter (http://jupyter.org/) also supports Spark and generally it's a
>> beast allows you to do so much more.
>> On Mar 1, 2016 00:25, "Mich Talebzadeh" <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Thank you very much both
>>>
>>> Zeppelin looks promising. Basically as I understand runs an agent on a
>>> given port (I chose 21999) on the host that Spark is installed. I created a
>>> notebook and running scripts through there. One thing for sure notebook
>>> just returns the results rather all other stuff that one does not need/.
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 29 February 2016 at 19:22, Minudika Malshan <minudika...@gmail.com>
>>> wrote:
>>>
>>>> +Adding resources
>>>> https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html
>>>> https://zeppelin.incubator.apache.org
>>>>
>>>> Minudika Malshan
>>>> Undergraduate
>>>> Department of Computer Science and Engineering
>>>> University of Moratuwa.
>>>> *Mobile : +94715659887 <%2B94715659887>*
>>>> *LinkedIn* : https://lk.linkedin.com/in/minudika
>>>>
>>>>
>>>>
>>>> On Tue, Mar 1, 2016 at 12:51 AM, Minudika Malshan <
>>>> minudika...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I think zeppelin spark interpreter will give a solution to your
>>>>> problem.
>>>>>
>>>>> Regards.
>>>>> Minudika
>>>>>
>>>>> Minudika Malshan
>>>>> Undergraduate
>>>>> Department of Computer Science and Engineering
>>>>> University of Moratuwa.
>>>>> *Mobile : +94715659887 <%2B94715659887>*
>>>>> *LinkedIn* : https://lk.linkedin.com/in/minudika
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 1, 2016 at 12:35 AM, Sabarish Sasidharan <
>>>>> sabarish.sasidha...@manthan.com> wrote:
>>>>>
>>>>>> Zeppelin?
>>>>>>
>>>>>> Regards
>>>>>> Sab
>>>>>> On 01-Mar-2016 12:27 am, "Mich Talebzadeh" <mich.talebza...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Is there such thing as Spark for client much like RDBMS client that
>>>>>>> have cut down version of their big brother useful for client 
>>>>>>> connectivity
>>>>>>> but cannot be used as server.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * 
>>>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>


Re: Spark for client

2016-03-01 Thread Mich Talebzadeh
Thanks Mohannad.

Installed Anaconda 3 that contains Jupyter. Now I want to access Spark on
Scala from Jupyter. What is the easiest way of doing it without using
Python!

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 1 March 2016 at 08:18, Mohannad Ali <man...@gmail.com> wrote:

> Jupyter (http://jupyter.org/) also supports Spark and generally it's a
> beast allows you to do so much more.
> On Mar 1, 2016 00:25, "Mich Talebzadeh" <mich.talebza...@gmail.com> wrote:
>
>> Thank you very much both
>>
>> Zeppelin looks promising. Basically as I understand runs an agent on a
>> given port (I chose 21999) on the host that Spark is installed. I created a
>> notebook and running scripts through there. One thing for sure notebook
>> just returns the results rather all other stuff that one does not need/.
>>
>> Cheers,
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 29 February 2016 at 19:22, Minudika Malshan <minudika...@gmail.com>
>> wrote:
>>
>>> +Adding resources
>>> https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html
>>> https://zeppelin.incubator.apache.org
>>>
>>> Minudika Malshan
>>> Undergraduate
>>> Department of Computer Science and Engineering
>>> University of Moratuwa.
>>> *Mobile : +94715659887 <%2B94715659887>*
>>> *LinkedIn* : https://lk.linkedin.com/in/minudika
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 12:51 AM, Minudika Malshan <minudika...@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> I think zeppelin spark interpreter will give a solution to your
>>>> problem.
>>>>
>>>> Regards.
>>>> Minudika
>>>>
>>>> Minudika Malshan
>>>> Undergraduate
>>>> Department of Computer Science and Engineering
>>>> University of Moratuwa.
>>>> *Mobile : +94715659887 <%2B94715659887>*
>>>> *LinkedIn* : https://lk.linkedin.com/in/minudika
>>>>
>>>>
>>>>
>>>> On Tue, Mar 1, 2016 at 12:35 AM, Sabarish Sasidharan <
>>>> sabarish.sasidha...@manthan.com> wrote:
>>>>
>>>>> Zeppelin?
>>>>>
>>>>> Regards
>>>>> Sab
>>>>> On 01-Mar-2016 12:27 am, "Mich Talebzadeh" <mich.talebza...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Is there such thing as Spark for client much like RDBMS client that
>>>>>> have cut down version of their big brother useful for client connectivity
>>>>>> but cannot be used as server.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>


Re: Spark for client

2016-03-01 Thread Mohannad Ali
Jupyter (http://jupyter.org/) also supports Spark and generally it's a
beast allows you to do so much more.
On Mar 1, 2016 00:25, "Mich Talebzadeh" <mich.talebza...@gmail.com> wrote:

> Thank you very much both
>
> Zeppelin looks promising. Basically as I understand runs an agent on a
> given port (I chose 21999) on the host that Spark is installed. I created a
> notebook and running scripts through there. One thing for sure notebook
> just returns the results rather all other stuff that one does not need/.
>
> Cheers,
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 29 February 2016 at 19:22, Minudika Malshan <minudika...@gmail.com>
> wrote:
>
>> +Adding resources
>> https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html
>> https://zeppelin.incubator.apache.org
>>
>> Minudika Malshan
>> Undergraduate
>> Department of Computer Science and Engineering
>> University of Moratuwa.
>> *Mobile : +94715659887 <%2B94715659887>*
>> *LinkedIn* : https://lk.linkedin.com/in/minudika
>>
>>
>>
>> On Tue, Mar 1, 2016 at 12:51 AM, Minudika Malshan <minudika...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I think zeppelin spark interpreter will give a solution to your problem.
>>>
>>> Regards.
>>> Minudika
>>>
>>> Minudika Malshan
>>> Undergraduate
>>> Department of Computer Science and Engineering
>>> University of Moratuwa.
>>> *Mobile : +94715659887 <%2B94715659887>*
>>> *LinkedIn* : https://lk.linkedin.com/in/minudika
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 12:35 AM, Sabarish Sasidharan <
>>> sabarish.sasidha...@manthan.com> wrote:
>>>
>>>> Zeppelin?
>>>>
>>>> Regards
>>>> Sab
>>>> On 01-Mar-2016 12:27 am, "Mich Talebzadeh" <mich.talebza...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there such thing as Spark for client much like RDBMS client that
>>>>> have cut down version of their big brother useful for client connectivity
>>>>> but cannot be used as server.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Spark for client

2016-02-29 Thread Mich Talebzadeh
Thank you very much both

Zeppelin looks promising. Basically as I understand runs an agent on a
given port (I chose 21999) on the host that Spark is installed. I created a
notebook and running scripts through there. One thing for sure notebook
just returns the results rather all other stuff that one does not need/.

Cheers,



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 29 February 2016 at 19:22, Minudika Malshan <minudika...@gmail.com>
wrote:

> +Adding resources
> https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html
> https://zeppelin.incubator.apache.org
>
> Minudika Malshan
> Undergraduate
> Department of Computer Science and Engineering
> University of Moratuwa.
> *Mobile : +94715659887 <%2B94715659887>*
> *LinkedIn* : https://lk.linkedin.com/in/minudika
>
>
>
> On Tue, Mar 1, 2016 at 12:51 AM, Minudika Malshan <minudika...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I think zeppelin spark interpreter will give a solution to your problem.
>>
>> Regards.
>> Minudika
>>
>> Minudika Malshan
>> Undergraduate
>> Department of Computer Science and Engineering
>> University of Moratuwa.
>> *Mobile : +94715659887 <%2B94715659887>*
>> *LinkedIn* : https://lk.linkedin.com/in/minudika
>>
>>
>>
>> On Tue, Mar 1, 2016 at 12:35 AM, Sabarish Sasidharan <
>> sabarish.sasidha...@manthan.com> wrote:
>>
>>> Zeppelin?
>>>
>>> Regards
>>> Sab
>>> On 01-Mar-2016 12:27 am, "Mich Talebzadeh" <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there such thing as Spark for client much like RDBMS client that
>>>> have cut down version of their big brother useful for client connectivity
>>>> but cannot be used as server.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>
>>
>


Re: Spark for client

2016-02-29 Thread Minudika Malshan
+Adding resources
https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html
https://zeppelin.incubator.apache.org

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa.
*Mobile : +94715659887*
*LinkedIn* : https://lk.linkedin.com/in/minudika



On Tue, Mar 1, 2016 at 12:51 AM, Minudika Malshan <minudika...@gmail.com>
wrote:

> Hi,
>
> I think zeppelin spark interpreter will give a solution to your problem.
>
> Regards.
> Minudika
>
> Minudika Malshan
> Undergraduate
> Department of Computer Science and Engineering
> University of Moratuwa.
> *Mobile : +94715659887 <%2B94715659887>*
> *LinkedIn* : https://lk.linkedin.com/in/minudika
>
>
>
> On Tue, Mar 1, 2016 at 12:35 AM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
>> Zeppelin?
>>
>> Regards
>> Sab
>> On 01-Mar-2016 12:27 am, "Mich Talebzadeh" <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Is there such thing as Spark for client much like RDBMS client that have
>>> cut down version of their big brother useful for client connectivity but
>>> cannot be used as server.
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>
>


Re: Spark for client

2016-02-29 Thread Minudika Malshan
Hi,

I think zeppelin spark interpreter will give a solution to your problem.

Regards.
Minudika

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa.
*Mobile : +94715659887*
*LinkedIn* : https://lk.linkedin.com/in/minudika



On Tue, Mar 1, 2016 at 12:35 AM, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:

> Zeppelin?
>
> Regards
> Sab
> On 01-Mar-2016 12:27 am, "Mich Talebzadeh" <mich.talebza...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Is there such thing as Spark for client much like RDBMS client that have
>> cut down version of their big brother useful for client connectivity but
>> cannot be used as server.
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>


Re: Spark for client

2016-02-29 Thread Sabarish Sasidharan
Zeppelin?

Regards
Sab
On 01-Mar-2016 12:27 am, "Mich Talebzadeh" <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> Is there such thing as Spark for client much like RDBMS client that have
> cut down version of their big brother useful for client connectivity but
> cannot be used as server.
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>


Spark for client

2016-02-29 Thread Mich Talebzadeh
Hi,

Is there such thing as Spark for client much like RDBMS client that have
cut down version of their big brother useful for client connectivity but
cannot be used as server.

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


spark yarn client mode

2016-01-19 Thread Sanjeev Verma
Hi

Do I need to install spark on all the yarn cluster node if I want to submit
the job to yarn client?
is there any way exists in which I can spawn a spark job executors on the
cluster nodes where I have not installed spark.

Thanks
Sanjeev


Re: spark yarn client mode

2016-01-19 Thread 刘虓
Hi,
No,you don't need to.
However,when submitting jobs certain resources will be uploaded to
hdfs,which could be a performance issue
read the log and you will understand:

15/12/29 11:10:06 INFO Client: Uploading resource
file:/data/spark/spark152/lib/spark-assembly-1.5.2-hadoop2.6.0.jar -> hdfs

15/12/29 11:10:08 INFO Client: Uploading resource
file:/data/spark/spark152/python/lib/pyspark.zip -> hdfs

15/12/29 11:10:08 INFO Client: Uploading resource
file:/data/spark/spark152/python/lib/py4j-0.8.2.1-src.zip -> hdfs

15/12/29 11:10:08 INFO Client: Uploading resource
file:/data/tmp/spark-86791975-2cef-4663-aacd-5da95e58cd91/__spark_conf__6261788210225867171.zip
-> hdfs

2016-01-19 19:43 GMT+08:00 Sanjeev Verma :

> Hi
>
> Do I need to install spark on all the yarn cluster node if I want to
> submit the job to yarn client?
> is there any way exists in which I can spawn a spark job executors on the
> cluster nodes where I have not installed spark.
>
> Thanks
> Sanjeev
>


Re: strange behavior in spark yarn-client mode

2016-01-14 Thread Marcelo Vanzin
On Thu, Jan 14, 2016 at 10:17 AM, Sanjeev Verma
 wrote:
> now it spawn a single executors with 1060M size, I am not able to understand
> why this time it executes executors with 1G+overhead not 2G what I
> specified.

Where are you looking for the memory size for the container?

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: strange behavior in spark yarn-client mode

2016-01-14 Thread Marcelo Vanzin
Please reply to the list.

The web ui does not show the total size of the executor's heap. It
shows the amount of memory available for caching data, which is, give
or take, 60% of the heap by default.

On Thu, Jan 14, 2016 at 11:03 AM, Sanjeev Verma
 wrote:
> I am looking into the web ui of spark application master(tab executors).
>
> On Fri, Jan 15, 2016 at 12:08 AM, Marcelo Vanzin 
> wrote:
>>
>> On Thu, Jan 14, 2016 at 10:17 AM, Sanjeev Verma
>>  wrote:
>> > now it spawn a single executors with 1060M size, I am not able to
>> > understand
>> > why this time it executes executors with 1G+overhead not 2G what I
>> > specified.
>>
>> Where are you looking for the memory size for the container?
>>
>> --
>> Marcelo
>
>



-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



strange behavior in spark yarn-client mode

2016-01-14 Thread Sanjeev Verma
I am seeing a strange behaviour while running spark in yarn client mode.I
am observing this on the single node yarn cluster.in spark-default I have
configured the executors memory as 2g and started the spark shell as follows

bin/spark-shell --master yarn-client

which trigger the 2 executors on the node with 1060MB of memory, I am able
to figure out that if you wont specify the num-executors it will span 2
executors on the node by defaults.


now when i try to run again it with the

bin/spark-shell --master yarn-client --num-executors 1

now it spawn a single executors with 1060M size, I am not able to
understand why this time it executes executors with 1G+overhead not 2G what
I specified.

why I am seeing this strange behavior?


Re: Stop Spark yarn-client job

2015-11-26 Thread Jeff Zhang
Could you attach the yarn AM log ?

On Fri, Nov 27, 2015 at 8:10 AM, Jagat Singh <jagatsi...@gmail.com> wrote:

> Hi,
>
> What is the correct way to stop fully the Spark job which is running as
> yarn-client using spark-submit.
>
> We are using sc.stop in the code and can see the job still running (in
> yarn resource manager) after final hive insert is complete.
>
> The code flow is
>
> start context
> do somework
> insert to hive
> sc.stop
>
> This is sparkling water job is that matters.
>
> Is there anything else needed ?
>
> Thanks,
>
> J
>
>
>


-- 
Best Regards

Jeff Zhang


Stop Spark yarn-client job

2015-11-26 Thread Jagat Singh
Hi,

What is the correct way to stop fully the Spark job which is running as
yarn-client using spark-submit.

We are using sc.stop in the code and can see the job still running (in yarn
resource manager) after final hive insert is complete.

The code flow is

start context
do somework
insert to hive
sc.stop

This is sparkling water job is that matters.

Is there anything else needed ?

Thanks,

J


Spark Yarn-client Kerberos on remote cluster

2015-04-14 Thread philippe L
Dear All,

I would like to know if its possible to configure the SparkConf() in order
to interact with a remote kerberized cluster in yarn-client mode.

the spark will not be installed on the cluster itself and the localhost
can't ask for a ticket, But a keytab as been generated in purpose and
provide for the localhost.

My purpose is to code in Eclipse on my localhost and submit my code in
yarn-client mode on a HDP 2.2 fully kerberized.

I actually work with spark on the cluster but its only a short term solution
because at the end my localhost will be on windows 7.

In advance, thank you for your answers



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Yarn-client-Kerberos-on-remote-cluster-tp22491.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Yarn-client Kerberos on remote cluster

2015-04-14 Thread Neal Yin
If your localhost can¹t talk to a KDC, you can¹t access a kerberized
cluster. Only key tab file is not enough.

-Neal


On 4/14/15, 3:54 AM, philippe L lanckvrind.p@gmail.com wrote:

Dear All,

I would like to know if its possible to configure the SparkConf() in order
to interact with a remote kerberized cluster in yarn-client mode.

the spark will not be installed on the cluster itself and the localhost
can't ask for a ticket, But a keytab as been generated in purpose and
provide for the localhost.

My purpose is to code in Eclipse on my localhost and submit my code in
yarn-client mode on a HDP 2.2 fully kerberized.

I actually work with spark on the cluster but its only a short term
solution
because at the end my localhost will be on windows 7.

In advance, thank you for your answers



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Yarn-client-Kerb
eros-on-remote-cluster-tp22491.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark yarn-client submission example?

2015-03-17 Thread Michal Klos
Hi,

We have a Scala application and we want it to programmatically submit Spark
jobs to a Spark-YARN cluster in yarn-client mode.

We're running into a lot of classpath issues, e.g. once submitted it looks
for jars in our parent Scala application's local directory, jars that it
shouldn't need. Our setJars in the SparkContext only mentions our fat jar,
which should be all it needs. We are not sure why the other jars are being
included once we submit and we don't see a mechanism to control what it
wants.

Here's a sample error:

Diagnostics: java.io.FileNotFoundException: File
file:/Users/github/spark/kindling-container/lib/spark-assembly-1.2.1-hadoop2.4.0.jar
does not exist
Failing this attempt. Failing the application.


I read through the user list and there was discussion around possibly using
Client.scala?

Are there any code examples out there that we could use as reference?

thanks,
Michal


Re: Spark (yarn-client mode) Hangs in final stages of Collect or Reduce

2015-02-09 Thread nitin
Have you checked the corresponding executor logs as well? I think information
provided by you here is less to actually understand your issue.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-yarn-client-mode-Hangs-in-final-stages-of-Collect-or-Reduce-tp21551p21557.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark clustered client

2014-07-23 Thread Nick Pentreath
At the moment your best bet for sharing SparkContexts across jobs will be 
Ooyala job server: https://github.com/ooyala/spark-jobserver


It doesn't yet support spark 1.0 though I did manage to amend it to get it to 
build and run on 1.0
—
Sent from Mailbox

On Wed, Jul 23, 2014 at 1:21 AM, Asaf Lahav asaf.la...@gmail.com wrote:

 Hi Folks,
 I have been trying to dig up some information in regards to what are the
 possibilities when wanting to deploy more than one client process that
 consumes Spark.
 Let's say I have a Spark Cluster of 10 servers, and would like to setup 2
 additional servers which are sending requests to it through a Spark
 context, referencing one specific file of 1TB of data.
 Each client process, has its own SparkContext instance.
 Currently, the result is that that same file is loaded into memory twice
 because the Spark Context resources are not shared between processes/jvms.
 I wouldn't like to have that same file loaded over and over again with
 every new client being introduced.
 What would be the best practice here? Am I missing something?
 Thank you,
 Asaf

Spark clustered client

2014-07-22 Thread Asaf Lahav
Hi Folks,

I have been trying to dig up some information in regards to what are the
possibilities when wanting to deploy more than one client process that
consumes Spark.

Let's say I have a Spark Cluster of 10 servers, and would like to setup 2
additional servers which are sending requests to it through a Spark
context, referencing one specific file of 1TB of data.

Each client process, has its own SparkContext instance.
Currently, the result is that that same file is loaded into memory twice
because the Spark Context resources are not shared between processes/jvms.


I wouldn't like to have that same file loaded over and over again with
every new client being introduced.
What would be the best practice here? Am I missing something?

Thank you,
Asaf