Re: Learning Spark

2019-07-05 Thread Alex A. Reda
Hello,

I also second Gourav's point regarding "Spark the definitive guide" book.
This is great for learning both Scala and python based SPARK. But as others
mentioned, you will need to continuously read the documentation as SPARK is
still undergoing a lot of improvements. I list additional resources below,
no plug :)

-   Excellent training on Spark 2 in Udemy by Jose Portilla. This one
is on Pyspark, he also has a training on Scala. Not super advanced but
touches the basics very well.
https://www.udemy.com/apache-spark-with-python-big-data-with-pyspark-and-spark/



-Great book on Spark 2, "Learning Pyspark" by Chambers and Zaharia
- so far the best in the resource lineup both for scala based and python
based Spark -
https://www.packtpub.com/big-data-and-business-intelligence/learning-pyspark
(Read
Chapter 1, 2, 4, and 6 to get immediate benefits)



-Great book on Spark by Tomasz Drabas and Denny Lee.
https://www.amazon.com/Spark-Definitive-Guide-Processing-Simple/dp/1491912219/ref=sr_1_1?ie=UTF8=1540567390=8-1=spark+the+definitive+guide
(Part
I, II, VI are the most important to get started). Apparently, they have a
new edition, I am referring to the 2017 edition.


- A bit dated now because Spark has evolved so much but I like Jeffrey
Aven's book and style of writing too."Sams Teach Yourself Apache Spark in
24 hours

"

In terms of actually learning, I would suggest practicing the code plus
based on my experience you are better off installing spark to your local
PC. I found this a much better way of learning than using an enterprise
cluster. Depending on which rout you take, if you decide to focus on
Pyspark, learning Scikit learn will provide you a lot of transferable
skills.

One final note, I am providing the suggestion from the perspective of a
data scientist.

Kind regards,

Alex Reda







On Fri, Jul 5, 2019 at 9:24 AM Gourav Sengupta 
wrote:

> okay this is all something which I would disagree with.
>
> Dr. Matei Zaharia created SPARK
> Then he and Bill Chambers wrote a book on SPARK recently
> He is still the main thinking power behind SPARK (look at his research in
> Stanford)
> The name of the book is "SPARK the definitive guide", its the best ever
> book and introduction on SPARK.
>
> I have been through several documentation, at least 40 books on SPARK, and
> nothing even comes close to this book. And also it puts into rest much of
> arguments around which language to choose.
>
> Thanks and Regards,
> Gourav Sengupta
>
> On Fri, Jul 5, 2019 at 11:55 AM Vikas Garg  wrote:
>
>> Thanks!!!
>>
>> On Fri, 5 Jul 2019 at 15:38, Chris Teoh  wrote:
>>
>>> Scala is better suited to data engineering work. It also has better
>>> integration with other components like HBase, Kafka, etc.
>>>
>>> Python is great for data scientists as there are more data science
>>> libraries available in Python.
>>>
>>> On Fri., 5 Jul. 2019, 7:40 pm Vikas Garg,  wrote:
>>>
 Is there any disadvantage of using Python? I have gone through multiple
 articles which says that Python has advantages over Scala.

 Scala is super fast in comparison but Python has more pre-built
 libraries and options for analytics.

 Still should I go with Scala?

 On Fri, 5 Jul 2019 at 13:07, Kurt Fehlhauer  wrote:

> Since you are a data engineer I would start by learning Scala. The
> parts of Scala you would need to learn are pretty basic. Start with the
> examples on the Spark website, which gives examples in multiple languages.
> Think of Scala as a typed version of Python. You will find that the error
> messages tend to be much more meaningful in Scala because that is the
> native language of Spark. If you don’t want to to install the JVM and
> Scala, I highly recommend Databricks community edition as a place to 
> start.
>
> On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg 
> wrote:
>
>> I am currently working as a data engineer and I am working on Power
>> BI, SSIS (ETL Tool). For learning purpose, I have done the setup PySpark
>> and also able to run queries through Spark on multi node cluster DB (I am
>> using Vertica DB and later will move on HDFS or SQL Server).
>>
>> I have good knowledge of Python also.
>>
>> On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer 
>> wrote:
>>
>>> Are you a data scientist or data engineer?
>>>
>>>
>>> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg 
>>> wrote:
>>>
 Hi,

 I am new Spark learner. Can someone guide me with the strategy
 towards getting expertise in PySpark.

 Thanks!!!

>>>


Re: Learning Spark

2019-07-05 Thread Gourav Sengupta
okay this is all something which I would disagree with.

Dr. Matei Zaharia created SPARK
Then he and Bill Chambers wrote a book on SPARK recently
He is still the main thinking power behind SPARK (look at his research in
Stanford)
The name of the book is "SPARK the definitive guide", its the best ever
book and introduction on SPARK.

I have been through several documentation, at least 40 books on SPARK, and
nothing even comes close to this book. And also it puts into rest much of
arguments around which language to choose.

Thanks and Regards,
Gourav Sengupta

On Fri, Jul 5, 2019 at 11:55 AM Vikas Garg  wrote:

> Thanks!!!
>
> On Fri, 5 Jul 2019 at 15:38, Chris Teoh  wrote:
>
>> Scala is better suited to data engineering work. It also has better
>> integration with other components like HBase, Kafka, etc.
>>
>> Python is great for data scientists as there are more data science
>> libraries available in Python.
>>
>> On Fri., 5 Jul. 2019, 7:40 pm Vikas Garg,  wrote:
>>
>>> Is there any disadvantage of using Python? I have gone through multiple
>>> articles which says that Python has advantages over Scala.
>>>
>>> Scala is super fast in comparison but Python has more pre-built
>>> libraries and options for analytics.
>>>
>>> Still should I go with Scala?
>>>
>>> On Fri, 5 Jul 2019 at 13:07, Kurt Fehlhauer  wrote:
>>>
 Since you are a data engineer I would start by learning Scala. The
 parts of Scala you would need to learn are pretty basic. Start with the
 examples on the Spark website, which gives examples in multiple languages.
 Think of Scala as a typed version of Python. You will find that the error
 messages tend to be much more meaningful in Scala because that is the
 native language of Spark. If you don’t want to to install the JVM and
 Scala, I highly recommend Databricks community edition as a place to start.

 On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg  wrote:

> I am currently working as a data engineer and I am working on Power
> BI, SSIS (ETL Tool). For learning purpose, I have done the setup PySpark
> and also able to run queries through Spark on multi node cluster DB (I am
> using Vertica DB and later will move on HDFS or SQL Server).
>
> I have good knowledge of Python also.
>
> On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer 
> wrote:
>
>> Are you a data scientist or data engineer?
>>
>>
>> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg 
>> wrote:
>>
>>> Hi,
>>>
>>> I am new Spark learner. Can someone guide me with the strategy
>>> towards getting expertise in PySpark.
>>>
>>> Thanks!!!
>>>
>>


Re: Learning Spark

2019-07-05 Thread Vikas Garg
Thanks!!!

On Fri, 5 Jul 2019 at 15:38, Chris Teoh  wrote:

> Scala is better suited to data engineering work. It also has better
> integration with other components like HBase, Kafka, etc.
>
> Python is great for data scientists as there are more data science
> libraries available in Python.
>
> On Fri., 5 Jul. 2019, 7:40 pm Vikas Garg,  wrote:
>
>> Is there any disadvantage of using Python? I have gone through multiple
>> articles which says that Python has advantages over Scala.
>>
>> Scala is super fast in comparison but Python has more pre-built libraries
>> and options for analytics.
>>
>> Still should I go with Scala?
>>
>> On Fri, 5 Jul 2019 at 13:07, Kurt Fehlhauer  wrote:
>>
>>> Since you are a data engineer I would start by learning Scala. The parts
>>> of Scala you would need to learn are pretty basic. Start with the examples
>>> on the Spark website, which gives examples in multiple languages. Think of
>>> Scala as a typed version of Python. You will find that the error messages
>>> tend to be much more meaningful in Scala because that is the native
>>> language of Spark. If you don’t want to to install the JVM and Scala, I
>>> highly recommend Databricks community edition as a place to start.
>>>
>>> On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg  wrote:
>>>
 I am currently working as a data engineer and I am working on Power BI,
 SSIS (ETL Tool). For learning purpose, I have done the setup PySpark and
 also able to run queries through Spark on multi node cluster DB (I am using
 Vertica DB and later will move on HDFS or SQL Server).

 I have good knowledge of Python also.

 On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer  wrote:

> Are you a data scientist or data engineer?
>
>
> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg 
> wrote:
>
>> Hi,
>>
>> I am new Spark learner. Can someone guide me with the strategy
>> towards getting expertise in PySpark.
>>
>> Thanks!!!
>>
>


Re: Learning Spark

2019-07-05 Thread Chris Teoh
Scala is better suited to data engineering work. It also has better
integration with other components like HBase, Kafka, etc.

Python is great for data scientists as there are more data science
libraries available in Python.

On Fri., 5 Jul. 2019, 7:40 pm Vikas Garg,  wrote:

> Is there any disadvantage of using Python? I have gone through multiple
> articles which says that Python has advantages over Scala.
>
> Scala is super fast in comparison but Python has more pre-built libraries
> and options for analytics.
>
> Still should I go with Scala?
>
> On Fri, 5 Jul 2019 at 13:07, Kurt Fehlhauer  wrote:
>
>> Since you are a data engineer I would start by learning Scala. The parts
>> of Scala you would need to learn are pretty basic. Start with the examples
>> on the Spark website, which gives examples in multiple languages. Think of
>> Scala as a typed version of Python. You will find that the error messages
>> tend to be much more meaningful in Scala because that is the native
>> language of Spark. If you don’t want to to install the JVM and Scala, I
>> highly recommend Databricks community edition as a place to start.
>>
>> On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg  wrote:
>>
>>> I am currently working as a data engineer and I am working on Power BI,
>>> SSIS (ETL Tool). For learning purpose, I have done the setup PySpark and
>>> also able to run queries through Spark on multi node cluster DB (I am using
>>> Vertica DB and later will move on HDFS or SQL Server).
>>>
>>> I have good knowledge of Python also.
>>>
>>> On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer  wrote:
>>>
 Are you a data scientist or data engineer?


 On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg  wrote:

> Hi,
>
> I am new Spark learner. Can someone guide me with the strategy towards
> getting expertise in PySpark.
>
> Thanks!!!
>



Re: Learning Spark

2019-07-05 Thread Vikas Garg
Is there any disadvantage of using Python? I have gone through multiple
articles which says that Python has advantages over Scala.

Scala is super fast in comparison but Python has more pre-built libraries
and options for analytics.

Still should I go with Scala?

On Fri, 5 Jul 2019 at 13:07, Kurt Fehlhauer  wrote:

> Since you are a data engineer I would start by learning Scala. The parts
> of Scala you would need to learn are pretty basic. Start with the examples
> on the Spark website, which gives examples in multiple languages. Think of
> Scala as a typed version of Python. You will find that the error messages
> tend to be much more meaningful in Scala because that is the native
> language of Spark. If you don’t want to to install the JVM and Scala, I
> highly recommend Databricks community edition as a place to start.
>
> On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg  wrote:
>
>> I am currently working as a data engineer and I am working on Power BI,
>> SSIS (ETL Tool). For learning purpose, I have done the setup PySpark and
>> also able to run queries through Spark on multi node cluster DB (I am using
>> Vertica DB and later will move on HDFS or SQL Server).
>>
>> I have good knowledge of Python also.
>>
>> On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer  wrote:
>>
>>> Are you a data scientist or data engineer?
>>>
>>>
>>> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg  wrote:
>>>
 Hi,

 I am new Spark learner. Can someone guide me with the strategy towards
 getting expertise in PySpark.

 Thanks!!!

>>>


Re: Learning Spark

2019-07-05 Thread Kurt Fehlhauer
Since you are a data engineer I would start by learning Scala. The parts of
Scala you would need to learn are pretty basic. Start with the examples on
the Spark website, which gives examples in multiple languages. Think of
Scala as a typed version of Python. You will find that the error messages
tend to be much more meaningful in Scala because that is the native
language of Spark. If you don’t want to to install the JVM and Scala, I
highly recommend Databricks community edition as a place to start.

On Thu, Jul 4, 2019 at 11:22 PM Vikas Garg  wrote:

> I am currently working as a data engineer and I am working on Power BI,
> SSIS (ETL Tool). For learning purpose, I have done the setup PySpark and
> also able to run queries through Spark on multi node cluster DB (I am using
> Vertica DB and later will move on HDFS or SQL Server).
>
> I have good knowledge of Python also.
>
> On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer  wrote:
>
>> Are you a data scientist or data engineer?
>>
>>
>> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg  wrote:
>>
>>> Hi,
>>>
>>> I am new Spark learner. Can someone guide me with the strategy towards
>>> getting expertise in PySpark.
>>>
>>> Thanks!!!
>>>
>>


Re: Learning Spark

2019-07-04 Thread Vikas Garg
I am currently working as a data engineer and I am working on Power BI,
SSIS (ETL Tool). For learning purpose, I have done the setup PySpark and
also able to run queries through Spark on multi node cluster DB (I am using
Vertica DB and later will move on HDFS or SQL Server).

I have good knowledge of Python also.

On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer  wrote:

> Are you a data scientist or data engineer?
>
>
> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg  wrote:
>
>> Hi,
>>
>> I am new Spark learner. Can someone guide me with the strategy towards
>> getting expertise in PySpark.
>>
>> Thanks!!!
>>
>


Re: Learning Spark

2019-07-04 Thread ayan guha
My best advise is to go through the docs and listen to lots of demo/videos
from spark committers.

On Fri, 5 Jul 2019 at 3:03 pm, Kurt Fehlhauer  wrote:

> Are you a data scientist or data engineer?
>
>
> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg  wrote:
>
>> Hi,
>>
>> I am new Spark learner. Can someone guide me with the strategy towards
>> getting expertise in PySpark.
>>
>> Thanks!!!
>>
> --
Best Regards,
Ayan Guha


Re: Learning Spark

2019-07-04 Thread Kurt Fehlhauer
Are you a data scientist or data engineer?


On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg  wrote:

> Hi,
>
> I am new Spark learner. Can someone guide me with the strategy towards
> getting expertise in PySpark.
>
> Thanks!!!
>


Learning Spark

2019-07-04 Thread Vikas Garg
Hi,

I am new Spark learner. Can someone guide me with the strategy towards
getting expertise in PySpark.

Thanks!!!


Re: learning Spark

2017-12-05 Thread makoto
This gitbook explains Spark compotents in detail.

'Mastering Apache Spark 2'

https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details




2017-12-04 12:48 GMT+09:00 Manuel Sopena Ballesteros <
manuel...@garvan.org.au>:

> Dear Spark community,
>
>
>
> Is there any resource (books, online course, etc.) available that you know
> of to learn about spark? I am interested in the sys admin side of it? like
> the different parts inside spark, how spark works internally, best ways to
> install/deploy/monitor and how to get best performance possible.
>
>
>
> Any suggestion?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Systems Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> 
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel...@garvan.org.au
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


Re: learning Spark

2017-12-05 Thread Jean Georges Perrin
When you pick a book, make sure it covers the version of Spark you want to 
deploy. There are a lot of books out there that focus a lot on Spark 1.x. Spark 
2.x generalizes the dataframe API, introduces Tungsten, etc. All might not be 
relevant to a pure “sys admin” learning, but it is good to know.

jg

> On Dec 3, 2017, at 22:48, Manuel Sopena Ballesteros  
> wrote:
> 
> Dear Spark community,
>  
> Is there any resource (books, online course, etc.) available that you know of 
> to learn about spark? I am interested in the sys admin side of it? like the 
> different parts inside spark, how spark works internally, best ways to 
> install/deploy/monitor and how to get best performance possible.
>  
> Any suggestion?
>  
> Thank you very much
>  
> Manuel Sopena Ballesteros | Systems Engineer
> Garvan Institute of Medical Research 
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: manuel...@garvan.org.au 
> 
>  
> NOTICE
> Please consider the environment before printing this email. This message and 
> any attachments are intended for the addressee named and may contain legally 
> privileged/confidential/copyright information. If you are not the intended 
> recipient, you should not read, use, disclose, copy or distribute this 
> communication. If you have received this message in error please notify us at 
> once by return email and then delete both messages. We accept no liability 
> for the distribution of viruses or similar in electronic communications. This 
> notice should not be removed.



Re: learning Spark

2017-12-04 Thread Elior Malul
Also, our community is responsive on stack overflow - also, I will be happy to 
help whenever I can.
> On Dec 5, 2017, at 9:14 AM, yohann jardin <yohannjar...@hotmail.com> wrote:
> 
> Plenty of documentation is available on Spark website itself: 
> http://spark.apache.org/docs/latest/#where-to-go-from-here 
> <http://spark.apache.org/docs/latest/#where-to-go-from-here>
> You’ll find deployment guides, tuning, etc.
> Yohann Jardin
> 
> Le 05-Dec-17 à 1:38 AM, Somasundaram Sekar a écrit :
>> Learning Spark - ORielly publication as a starter and official doc
>> 
>> On 4 Dec 2017 9:19 am, "Manuel Sopena Ballesteros" <manuel...@garvan.org.au 
>> <mailto:manuel...@garvan.org.au>> wrote:
>> Dear Spark community,
>> 
>>  
>> Is there any resource (books, online course, etc.) available that you know 
>> of to learn about spark? I am interested in the sys admin side of it? like 
>> the different parts inside spark, how spark works internally, best ways to 
>> install/deploy/monitor and how to get best performance possible.
>> 
>>  
>> Any suggestion?
>> 
>>  
>> Thank you very much
>> 
>>  
>> Manuel Sopena Ballesteros | Systems Engineer
>> Garvan Institute of Medical Research 
>> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010 
>> <https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010=gmail=g>
>> T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: manuel...@garvan.org.au 
>> <mailto:manuel...@garvan.org.au>
>>  
>> NOTICE
>> Please consider the environment before printing this email. This message and 
>> any attachments are intended for the addressee named and may contain legally 
>> privileged/confidential/copyright information. If you are not the intended 
>> recipient, you should not read, use, disclose, copy or distribute this 
>> communication. If you have received this message in error please notify us 
>> at once by return email and then delete both messages. We accept no 
>> liability for the distribution of viruses or similar in electronic 
>> communications. This notice should not be removed.
>> 
>> Disclaimer: This e-mail is intended to be delivered only to the named 
>> addressee(s). If this information is received by anyone other than the named 
>> addressee(s), the recipient(s) should immediately notify 
>> i...@tigeranalytics.com <mailto:i...@tigeranalytics.com> and promptly delete 
>> the transmitted material from your computer and server.   In no event shall 
>> this material be read, used, stored, or retained by anyone other than the 
>> named addressee(s) without the express written consent of the sender or the 
>> named addressee(s). Computer viruses can be transmitted viaemail. The 
>> recipient should check this email and any attachments for viruses. The 
>> company accepts no liability for any damage caused by any virus transmitted 
>> by this email.
> 



Re: learning Spark

2017-12-04 Thread yohann jardin
Plenty of documentation is available on Spark website itself: 
http://spark.apache.org/docs/latest/#where-to-go-from-here

You’ll find deployment guides, tuning, etc.

Yohann Jardin

Le 05-Dec-17 à 1:38 AM, Somasundaram Sekar a écrit :
Learning Spark - ORielly publication as a starter and official doc

On 4 Dec 2017 9:19 am, "Manuel Sopena Ballesteros" 
<manuel...@garvan.org.au<mailto:manuel...@garvan.org.au>> wrote:
Dear Spark community,

Is there any resource (books, online course, etc.) available that you know of 
to learn about spark? I am interested in the sys admin side of it? like the 
different parts inside spark, how spark works internally, best ways to 
install/deploy/monitor and how to get best performance possible.

Any suggestion?

Thank you very much

Manuel Sopena Ballesteros | Systems Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 
2010<https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010=gmail=g>
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: 
manuel...@garvan.org.au<mailto:manuel...@garvan.org.au>

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.

Disclaimer: This e-mail is intended to be delivered only to the named 
addressee(s). If this information is received by anyone other than the named 
addressee(s), the recipient(s) should immediately notify 
i...@tigeranalytics.com<mailto:i...@tigeranalytics.com> and promptly delete the 
transmitted material from your computer and server.   In no event shall this 
material be read, used, stored, or retained by anyone other than the named 
addressee(s) without the express written consent of the sender or the named 
addressee(s). Computer viruses can be transmitted viaemail. The recipient 
should check this email and any attachments for viruses. The company accepts no 
liability for any damage caused by any virus transmitted by this email.



Re: learning Spark

2017-12-04 Thread Somasundaram Sekar
Learning Spark - ORielly publication as a starter and official doc

On 4 Dec 2017 9:19 am, "Manuel Sopena Ballesteros" <manuel...@garvan.org.au>
wrote:

> Dear Spark community,
>
>
>
> Is there any resource (books, online course, etc.) available that you know
> of to learn about spark? I am interested in the sys admin side of it? like
> the different parts inside spark, how spark works internally, best ways to
> install/deploy/monitor and how to get best performance possible.
>
>
>
> Any suggestion?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Systems Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> <https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010=gmail=g>
> *T:* + 61 (0)2 9355 5760 | *F:* +61 (0)2 9295 8507 | *E:*
> manuel...@garvan.org.au
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>

-- 
*Disclaimer*: This e-mail is intended to be delivered only to the named 
addressee(s). If this information is received by anyone other than the 
named addressee(s), the recipient(s) should immediately notify 
i...@tigeranalytics.com and promptly delete the transmitted material from 
your computer and server.   In no event shall this material be read, used, 
stored, or retained by anyone other than the named addressee(s) without the 
express written consent of the sender or the named addressee(s). Computer 
viruses can be transmitted viaemail. The recipient should check this email and 
any attachments for viruses. The company accepts no liability for any 
damage caused by any virus transmitted by this email.


learning Spark

2017-12-03 Thread Manuel Sopena Ballesteros
Dear Spark community,

Is there any resource (books, online course, etc.) available that you know of 
to learn about spark? I am interested in the sys admin side of it? like the 
different parts inside spark, how spark works internally, best ways to 
install/deploy/monitor and how to get best performance possible.

Any suggestion?

Thank you very much

Manuel Sopena Ballesteros | Systems Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


Re: Resources for learning Spark administration

2016-09-04 Thread Mich Talebzadeh
Hi,

There are a lof of stuff to cover here depending on the business and your
needs

Do you mean:


   1. Hardware spec for Spark master and nodes
   2. The number of nodes, How to scale the nodes
   3. Where to set up Spark nodes, on the same Hardware nodes as HDFS
   (assuming using Hadoop) or on the same subnet
   4. The network bandwidth between Spark cluster and Hadoop cluster
   5. The mode of operations; Local. Standalone, Yarn (cluster and client)
   etc.
   6. General Spark admin, job and log monitoring
   7. Security admin
   8. Setting up and configure Spark Thrift Servers (STS) and how many
   (multiple STSs on the same host, different nodes etc)
   9. Host of other matters

Spark online documents are a good way to start.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 September 2016 at 19:34, Somasundaram Sekar <
somasundar.se...@tigeranalytics.com> wrote:

> Please suggest some good resources to learn Spark administration.
>


Resources for learning Spark administration

2016-09-04 Thread Somasundaram Sekar
Please suggest some good resources to learn Spark administration.


Need 'Learning Spark' Partner

2016-01-13 Thread King sami
Hi,

As I'm beginner in Spark, I'm looking for someone who's also beginner to
learn and train on Spark together.

Please contact me if interested

Cordially,


Re: Install via directions in "Learning Spark". Exception when running bin/pyspark

2015-10-13 Thread Robineast
What you have done should work.

A couple of things to try:

1) you should have a lib directory in your Spark deployment, it should have
a jar file called lib/spark-assembly-1.5.1-hadoop2.6.0.jar. Is it there?
2) Have you set the JAVA_HOME variable to point to your java8 deployment? If
not try doing that.

Robin



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Install-via-directions-in-Learning-Spark-Exception-when-running-bin-pyspark-tp25043p25048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Install via directions in "Learning Spark". Exception when running bin/pyspark

2015-10-13 Thread David Bess
Got it working!  Thank you for confirming my suspicion that this issue was
related to Java.  When I dug deeper I found multiple versions and some other
issues.  I worked on it a while before deciding it would be easier to just
uninstall all Java and reinstall clean JDK, and now it works perfectly.  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Install-via-directions-in-Learning-Spark-Exception-when-running-bin-pyspark-tp25043p25049.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Install via directions in "Learning Spark". Exception when running bin/pyspark

2015-10-12 Thread David Bess
Greetings all,

Excited to be learning spark.  I am working through the "Learning Spark"
book and I am having trouble getting Spark installed and running.  

This is what I have done so far.  

I installed Spark from here: 

http://spark.apache.org/downloads.html

selecting 1.5.1, prebuilt for hadoop 2.6 and later, direct download.

I untared the download

cd downloads
tar -xf spark-1.5.1-bin-hadoop2.6.tgz
cd spark-1.5.1-bin-hadoop2.6

Next I try running a shell, the example in the book claims we can run in
local mode and there should be no need to install hadoop / yarn / mesos or
anything else to get started.  

I have tried the following commands

./bin/pyspark
bin/pyspark
./bin/spark-shell
bin/spark-shell

I am getting an error as follows:

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/launcher/Main
Caused by: java.lang.ClassNotFoundException: org.apache.spark.launcher.Main
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

About my system.  

I have a macbook pro OS X Yosemite 10.10.5
I just downloaded and installed the latest Java from Oracle website, I
believe this was java8u60
I double checked my python version and it appears to be 2.7.10

I am familiar with command line, and have background in hadoop, but this has
me stumped.  

Thanks in advance,

David Bess






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Install-via-directions-in-Learning-Spark-Exception-when-running-bin-pyspark-tp25043.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Learning Spark

2015-04-06 Thread Akhil Das
We had few sessions at Sigmoid, you could go through the meetup page for
details:

http://www.meetup.com/Real-Time-Data-Processing-and-Cloud-Computing/
On 6 Apr 2015 18:01, Abhideep Chakravarty 
abhideep.chakrava...@mindtree.com wrote:

   Hi all,



 We are here planning to setup a Spark learning session series. I need all
 of your input to create a TOC for this  program i.e. what all to cover if
 we need to start from basics and upto what we should go to cover all the
 aspects of Spark in details.



 Also, I need to know on what all databases, Spark can work (other than
 Cassandra) ?



 Input from you will be very helpful. Thanks in advance for your time and
 effort.



 Regards,

 Abhideep

 --

 http://www.mindtree.com/email/disclaimer.html



Re: Learning Spark

2015-04-06 Thread Ted Yu
bq. I need to know on what all databases

You can access HBase using Spark.

Cheers

On Mon, Apr 6, 2015 at 5:59 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 We had few sessions at Sigmoid, you could go through the meetup page for
 details:

 http://www.meetup.com/Real-Time-Data-Processing-and-Cloud-Computing/
 On 6 Apr 2015 18:01, Abhideep Chakravarty 
 abhideep.chakrava...@mindtree.com wrote:

   Hi all,



 We are here planning to setup a Spark learning session series. I need all
 of your input to create a TOC for this  program i.e. what all to cover if
 we need to start from basics and upto what we should go to cover all the
 aspects of Spark in details.



 Also, I need to know on what all databases, Spark can work (other than
 Cassandra) ?



 Input from you will be very helpful. Thanks in advance for your time and
 effort.



 Regards,

 Abhideep

 --

 http://www.mindtree.com/email/disclaimer.html




Cannot build learning spark project

2015-04-06 Thread Adamantios Corais
Hi,

I am trying to build this project
https://github.com/databricks/learning-spark with mvn package.This should
work out of the box but unfortunately it doesn't. In fact, I get the
following error:

mvn pachage -X
 Apache Maven 3.0.5
 Maven home: /usr/share/maven
 Java version: 1.7.0_76, vendor: Oracle Corporation
 Java home: /usr/lib/jvm/java-7-oracle/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 3.13.0-45-generic, arch: amd64, family:
 unix
 [INFO] Error stacktraces are turned on.
 [DEBUG] Reading global settings from /usr/share/maven/conf/settings.xml
 [DEBUG] Reading user settings from /home/adam/.m2/settings.xml
 [DEBUG] Using local repository at /home/adam/.m2/repository
 [DEBUG] Using manager EnhancedLocalRepositoryManager with priority 10 for
 /home/adam/.m2/repository
 [INFO] Scanning for projects...
 [DEBUG] Extension realms for project
 com.oreilly.learningsparkexamples:java:jar:0.0.2: (none)
 [DEBUG] Looking up lifecyle mappings for packaging jar from
 ClassRealm[plexus.core, parent: null]
 [ERROR] The build could not read 1 project - [Help 1]
 org.apache.maven.project.ProjectBuildingException: Some problems were
 encountered while processing the POMs:
 [ERROR] 'dependencies.dependency.artifactId' for
 org.scalatest:scalatest_${scala.binary.version}:jar with value
 'scalatest_${scala.binary.version}' does not match a valid id pattern. @
 line 101, column 19
 at
 org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:363)
 at org.apache.maven.DefaultMaven.collectProjects(DefaultMaven.java:636)
 at
 org.apache.maven.DefaultMaven.getProjectsForMavenReactor(DefaultMaven.java:585)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:234)
 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
 at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
 at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
 at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
 [ERROR]
 [ERROR]   The project com.oreilly.learningsparkexamples:java:0.0.2
 (/home/adam/learning-spark/learning-spark-master/pom.xml) has 1 error
 [ERROR] 'dependencies.dependency.artifactId' for
 org.scalatest:scalatest_${scala.binary.version}:jar with value
 'scalatest_${scala.binary.version}' does not match a valid id pattern. @
 line 101, column 19
 [ERROR]
 [ERROR]
 [ERROR] For more information about the errors and possible solutions,
 please read the following articles:
 [ERROR] [Help 1]
 http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException


As a further step I would like to know how to build it against DataStax
Enterprise 4.6.2

Any help is appreciated!


*// Adamantios*


Re: Cannot build learning spark project

2015-04-06 Thread Sean Owen
(This mailing list concerns Spark itself rather than the book about
Spark. Your question is about building code that isn't part of Spark,
so, the right place to ask is
https://github.com/databricks/learning-spark  You have a typo in
pachage but I assume that's just your typo in this email.)

On Mon, Apr 6, 2015 at 12:23 PM, Adamantios Corais
adamantios.cor...@gmail.com wrote:
 Hi,

 I am trying to build this project
 https://github.com/databricks/learning-spark with mvn package.This should
 work out of the box but unfortunately it doesn't. In fact, I get the
 following error:

 mvn pachage -X
 Apache Maven 3.0.5
 Maven home: /usr/share/maven
 Java version: 1.7.0_76, vendor: Oracle Corporation
 Java home: /usr/lib/jvm/java-7-oracle/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 3.13.0-45-generic, arch: amd64, family:
 unix
 [INFO] Error stacktraces are turned on.
 [DEBUG] Reading global settings from /usr/share/maven/conf/settings.xml
 [DEBUG] Reading user settings from /home/adam/.m2/settings.xml
 [DEBUG] Using local repository at /home/adam/.m2/repository
 [DEBUG] Using manager EnhancedLocalRepositoryManager with priority 10 for
 /home/adam/.m2/repository
 [INFO] Scanning for projects...
 [DEBUG] Extension realms for project
 com.oreilly.learningsparkexamples:java:jar:0.0.2: (none)
 [DEBUG] Looking up lifecyle mappings for packaging jar from
 ClassRealm[plexus.core, parent: null]
 [ERROR] The build could not read 1 project - [Help 1]
 org.apache.maven.project.ProjectBuildingException: Some problems were
 encountered while processing the POMs:
 [ERROR] 'dependencies.dependency.artifactId' for
 org.scalatest:scalatest_${scala.binary.version}:jar with value
 'scalatest_${scala.binary.version}' does not match a valid id pattern. @
 line 101, column 19
 at
 org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:363)
 at org.apache.maven.DefaultMaven.collectProjects(DefaultMaven.java:636)
 at
 org.apache.maven.DefaultMaven.getProjectsForMavenReactor(DefaultMaven.java:585)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:234)
 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
 at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
 at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
 at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
 [ERROR]
 [ERROR]   The project com.oreilly.learningsparkexamples:java:0.0.2
 (/home/adam/learning-spark/learning-spark-master/pom.xml) has 1 error
 [ERROR] 'dependencies.dependency.artifactId' for
 org.scalatest:scalatest_${scala.binary.version}:jar with value
 'scalatest_${scala.binary.version}' does not match a valid id pattern. @
 line 101, column 19
 [ERROR]
 [ERROR]
 [ERROR] For more information about the errors and possible solutions,
 please read the following articles:
 [ERROR] [Help 1]
 http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException


 As a further step I would like to know how to build it against DataStax
 Enterprise 4.6.2

 Any help is appreciated!

 // Adamantios



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Learning Spark

2015-04-06 Thread Abhideep Chakravarty
Hi all,

We are here planning to setup a Spark learning session series. I need all of 
your input to create a TOC for this  program i.e. what all to cover if we need 
to start from basics and upto what we should go to cover all the aspects of 
Spark in details.

Also, I need to know on what all databases, Spark can work (other than 
Cassandra) ?

Input from you will be very helpful. Thanks in advance for your time and effort.

Regards,
Abhideep



http://www.mindtree.com/email/disclaimer.html