How to persistent database/table created in sparkSession

2017-12-04 Thread 163
Hi, 
How can I persistent database/table created in spark application?

object TestPersistentDB {
def main(args:Array[String]): Unit = {
val spark = SparkSession.builder()
.appName("Create persistent table")
.config("spark.master”,"local")
.getOrCreate()
import spark.implicits._
spark.sql("create database testdb location 
\"hdfs://node1:8020/testdb\")

}
}

  When I use spark.sql(“create database”) in sparkSession, and close this 
sparkSession. 
  The created database is not persisted to metadata, So I cannot find it in 
spark-sql: show databases.



regards
wendy

Re: learning Spark

2017-12-04 Thread Elior Malul
Also, our community is responsive on stack overflow - also, I will be happy to 
help whenever I can.
> On Dec 5, 2017, at 9:14 AM, yohann jardin  wrote:
> 
> Plenty of documentation is available on Spark website itself: 
> http://spark.apache.org/docs/latest/#where-to-go-from-here 
> 
> You’ll find deployment guides, tuning, etc.
> Yohann Jardin
> 
> Le 05-Dec-17 à 1:38 AM, Somasundaram Sekar a écrit :
>> Learning Spark - ORielly publication as a starter and official doc
>> 
>> On 4 Dec 2017 9:19 am, "Manuel Sopena Ballesteros" > > wrote:
>> Dear Spark community,
>> 
>>  
>> Is there any resource (books, online course, etc.) available that you know 
>> of to learn about spark? I am interested in the sys admin side of it? like 
>> the different parts inside spark, how spark works internally, best ways to 
>> install/deploy/monitor and how to get best performance possible.
>> 
>>  
>> Any suggestion?
>> 
>>  
>> Thank you very much
>> 
>>  
>> Manuel Sopena Ballesteros | Systems Engineer
>> Garvan Institute of Medical Research 
>> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010 
>> 
>> T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: manuel...@garvan.org.au 
>> 
>>  
>> NOTICE
>> Please consider the environment before printing this email. This message and 
>> any attachments are intended for the addressee named and may contain legally 
>> privileged/confidential/copyright information. If you are not the intended 
>> recipient, you should not read, use, disclose, copy or distribute this 
>> communication. If you have received this message in error please notify us 
>> at once by return email and then delete both messages. We accept no 
>> liability for the distribution of viruses or similar in electronic 
>> communications. This notice should not be removed.
>> 
>> Disclaimer: This e-mail is intended to be delivered only to the named 
>> addressee(s). If this information is received by anyone other than the named 
>> addressee(s), the recipient(s) should immediately notify 
>> i...@tigeranalytics.com  and promptly delete 
>> the transmitted material from your computer and server.   In no event shall 
>> this material be read, used, stored, or retained by anyone other than the 
>> named addressee(s) without the express written consent of the sender or the 
>> named addressee(s). Computer viruses can be transmitted viaemail. The 
>> recipient should check this email and any attachments for viruses. The 
>> company accepts no liability for any damage caused by any virus transmitted 
>> by this email.
> 



Re: learning Spark

2017-12-04 Thread yohann jardin
Plenty of documentation is available on Spark website itself: 
http://spark.apache.org/docs/latest/#where-to-go-from-here

You’ll find deployment guides, tuning, etc.

Yohann Jardin

Le 05-Dec-17 à 1:38 AM, Somasundaram Sekar a écrit :
Learning Spark - ORielly publication as a starter and official doc

On 4 Dec 2017 9:19 am, "Manuel Sopena Ballesteros" 
> wrote:
Dear Spark community,

Is there any resource (books, online course, etc.) available that you know of 
to learn about spark? I am interested in the sys admin side of it? like the 
different parts inside spark, how spark works internally, best ways to 
install/deploy/monitor and how to get best performance possible.

Any suggestion?

Thank you very much

Manuel Sopena Ballesteros | Systems Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 
2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.

Disclaimer: This e-mail is intended to be delivered only to the named 
addressee(s). If this information is received by anyone other than the named 
addressee(s), the recipient(s) should immediately notify 
i...@tigeranalytics.com and promptly delete the 
transmitted material from your computer and server.   In no event shall this 
material be read, used, stored, or retained by anyone other than the named 
addressee(s) without the express written consent of the sender or the named 
addressee(s). Computer viruses can be transmitted viaemail. The recipient 
should check this email and any attachments for viruses. The company accepts no 
liability for any damage caused by any virus transmitted by this email.



Re: learning Spark

2017-12-04 Thread Somasundaram Sekar
Learning Spark - ORielly publication as a starter and official doc

On 4 Dec 2017 9:19 am, "Manuel Sopena Ballesteros" 
wrote:

> Dear Spark community,
>
>
>
> Is there any resource (books, online course, etc.) available that you know
> of to learn about spark? I am interested in the sys admin side of it? like
> the different parts inside spark, how spark works internally, best ways to
> install/deploy/monitor and how to get best performance possible.
>
>
>
> Any suggestion?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Systems Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> 
> *T:* + 61 (0)2 9355 5760 | *F:* +61 (0)2 9295 8507 | *E:*
> manuel...@garvan.org.au
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>

-- 
*Disclaimer*: This e-mail is intended to be delivered only to the named 
addressee(s). If this information is received by anyone other than the 
named addressee(s), the recipient(s) should immediately notify 
i...@tigeranalytics.com and promptly delete the transmitted material from 
your computer and server.   In no event shall this material be read, used, 
stored, or retained by anyone other than the named addressee(s) without the 
express written consent of the sender or the named addressee(s). Computer 
viruses can be transmitted viaemail. The recipient should check this email and 
any attachments for viruses. The company accepts no liability for any 
damage caused by any virus transmitted by this email.


Re: Access to Applications metrics

2017-12-04 Thread Qiao, Richard
It works to collect Job level, through Jolokia java agent.

Best Regards
Richard


From: Nick Dimiduk 
Date: Monday, December 4, 2017 at 6:53 PM
To: "user@spark.apache.org" 
Subject: Re: Access to Applications metrics

Bump.

On Wed, Nov 15, 2017 at 2:28 PM, Nick Dimiduk 
> wrote:
Hello,

I'm wondering if it's possible to get access to the detailed job/stage/task 
level metrics via the metrics system (JMX, Graphite, ). I've enabled the 
wildcard sink and I do not see them. It seems these values are only available 
over http/json and to SparkListener instances, is this the case? Has anyone 
worked on a SparkListener that would bridge data from one to the other?

Thanks,
Nick



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Access to Applications metrics

2017-12-04 Thread Nick Dimiduk
Bump.

On Wed, Nov 15, 2017 at 2:28 PM, Nick Dimiduk  wrote:

> Hello,
>
> I'm wondering if it's possible to get access to the detailed
> job/stage/task level metrics via the metrics system (JMX, Graphite, ).
> I've enabled the wildcard sink and I do not see them. It seems these values
> are only available over http/json and to SparkListener instances, is this
> the case? Has anyone worked on a SparkListener that would bridge data from
> one to the other?
>
> Thanks,
> Nick
>


Re: Add snappy support for spark in Windows

2017-12-04 Thread Junfeng Chen
I have put winutils and hadoop.dll within HADOOP_HOME, and spark works well
with it, but snappy decompress function throw the above exception.


Regard,
Junfeng Chen

On Mon, Dec 4, 2017 at 7:07 PM, Qiao, Richard 
wrote:

> Junjeng, it worth a try to start your spark local with
> hadoop.dll/winutils.exe etc hadoop windows support package in HADOOP_HOME,
> if you didn’t do that yet.
>
>
>
> Best Regards
>
> Richard
>
>
>
>
>
> *From: *Junfeng Chen 
> *Date: *Monday, December 4, 2017 at 3:53 AM
> *To: *"Qiao, Richard" 
> *Cc: *"user@spark.apache.org" 
> *Subject: *Re: Add snappy support for spark in Windows
>
>
>
> But I am working on my local development machine, so it should have no
> relative to workers/executers.
>
>
>
> I find some documents about enable snappy on hadoop. If I want to use
> snappy with spark, do I need to config spark as hadoop or have some easy
> way to access it?
>
>
>
>
> Regard,
> Junfeng Chen
>
>
>
> On Mon, Dec 4, 2017 at 4:12 PM, Qiao, Richard 
> wrote:
>
> It seems a common mistake that the path is not accessible by
> workers/executors.
>
>
>
> Best regards
>
> Richard
>
> Sent from my iPhone
>
>
> On Dec 3, 2017, at 22:32, Junfeng Chen  wrote:
>
> I am working on importing snappy compressed json file into spark rdd or
> dataset. However I meet this error: java.lang.UnsatisfiedLinkError:
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
>
> I have set the following configuration:
>
> SparkConf conf = new SparkConf()
>
> .setAppName("normal spark")
>
> .setMaster("local")
>
> .set("spark.io.compression.codec", 
> "org.apache.spark.io.SnappyCompressionCodec")
>
> 
> .set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
>
> 
> .set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
>
> 
> .set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
>
> 
> .set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
>
> ;
>
> Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path,
> and I can find the snappy jar file snappy-0.2.jar and
> snappy-java-1.1.2.6.jar in
>
> D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\
>
> However nothing works and even the error message not change.
>
> How can I fix it?
>
>
>
> ref of stackoverflow: https://stackoverflow.com/questions/
> 47626012/config-snappy-support-for-spark-in-windows
> 
>
>
>
>
>
>
> Regard,
> Junfeng Chen
>
>
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


Re: Add snappy support for spark in Windows

2017-12-04 Thread Qiao, Richard
Junjeng, it worth a try to start your spark local with hadoop.dll/winutils.exe 
etc hadoop windows support package in HADOOP_HOME, if you didn’t do that yet.

Best Regards
Richard


From: Junfeng Chen 
Date: Monday, December 4, 2017 at 3:53 AM
To: "Qiao, Richard" 
Cc: "user@spark.apache.org" 
Subject: Re: Add snappy support for spark in Windows

But I am working on my local development machine, so it should have no relative 
to workers/executers.

I find some documents about enable snappy on hadoop. If I want to use snappy 
with spark, do I need to config spark as hadoop or have some easy way to access 
it?


Regard,
Junfeng Chen

On Mon, Dec 4, 2017 at 4:12 PM, Qiao, Richard 
> wrote:
It seems a common mistake that the path is not accessible by workers/executors.

Best regards
Richard

Sent from my iPhone

On Dec 3, 2017, at 22:32, Junfeng Chen 
> wrote:

I am working on importing snappy compressed json file into spark rdd or 
dataset. However I meet this error: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

I have set the following configuration:

SparkConf conf = new SparkConf()

.setAppName("normal spark")

.setMaster("local")

.set("spark.io.compression.codec", 
"org.apache.spark.io.SnappyCompressionCodec")


.set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")


.set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")


.set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")


.set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")

;

Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path, and I 
can find the snappy jar file snappy-0.2.jar and snappy-java-1.1.2.6.jar in

D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\

However nothing works and even the error message not change.

How can I fix it?



ref of stackoverflow: 
https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows
 



Regard,
Junfeng Chen



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-04 Thread bsikander
So, I tried to use SparkAppHandle.Listener with SparkLauncher as you
suggested. The behavior of Launcher is not what I expected.

1- If I start the job (using SparkLauncher) and my Spark cluster has enough
cores available, I receive events in my class extending
SparkAppHandle.Listener and I see the status getting changed from
UNKOWN->CONNECTED -> SUBMITTED -> RUNNING. All good here.

2- If my Spark cluster has cores only for my Driver process (running in
cluster mode) but no cores for my executor, then I still receive the RUNNING
event. I was expecting something else since my executor has no cores and
Master UI shows WAITING state for executors, listener should respond with
SUBMITTED state instead of RUNNING.

3- If my Spark cluster has no cores for even the driver process then
SparkLauncher invokes no events at all. The state stays in UNKNOWN. I would
have expected it to be in SUBMITTED state atleast.

*Is there any way with which I can reliably get the WAITING state of job?*
Driver=RUNNING, executor=RUNNING, overall state should be RUNNING
Driver=RUNNING, executor=WAITING overall state should be SUBMITTED/WAITING
Driver=WAITING, executor=WAITING overall state should be
CONNECTED/SUBMITTED/WAITING







--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Add snappy support for spark in Windows

2017-12-04 Thread Junfeng Chen
But I am working on my local development machine, so it should have no
relative to workers/executers.

I find some documents about enable snappy on hadoop. If I want to use
snappy with spark, do I need to config spark as hadoop or have some easy
way to access it?


Regard,
Junfeng Chen

On Mon, Dec 4, 2017 at 4:12 PM, Qiao, Richard 
wrote:

> It seems a common mistake that the path is not accessible by
> workers/executors.
>
> Best regards
> Richard
>
> Sent from my iPhone
>
> On Dec 3, 2017, at 22:32, Junfeng Chen  wrote:
>
> I am working on importing snappy compressed json file into spark rdd or
> dataset. However I meet this error: java.lang.UnsatisfiedLinkError:
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
>
> I have set the following configuration:
>
> SparkConf conf = new SparkConf()
> .setAppName("normal spark")
> .setMaster("local")
> .set("spark.io.compression.codec", 
> "org.apache.spark.io.SnappyCompressionCodec")
> 
> .set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
> 
> .set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
> 
> .set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
> 
> .set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
> ;
>
> Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path,
> and I can find the snappy jar file snappy-0.2.jar and
> snappy-java-1.1.2.6.jar in
>
> D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\
>
> However nothing works and even the error message not change.
>
> How can I fix it?
>
>
> ref of stackoverflow: https://stackoverflow.com/questions/47626012/
> config-snappy-support-for-spark-in-windows
> 
>
>
>
> Regard,
> Junfeng Chen
>
>
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


Re: Add snappy support for spark in Windows

2017-12-04 Thread Qiao, Richard
It seems a common mistake that the path is not accessible by workers/executors.

Best regards
Richard

Sent from my iPhone

On Dec 3, 2017, at 22:32, Junfeng Chen 
> wrote:


I am working on importing snappy compressed json file into spark rdd or 
dataset. However I meet this error: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

I have set the following configuration:

SparkConf conf = new SparkConf()
.setAppName("normal spark")
.setMaster("local")
.set("spark.io.compression.codec", 
"org.apache.spark.io.SnappyCompressionCodec")

.set("spark.driver.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")

.set("spark.driver.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")

.set("spark.executor.extraLibraryPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")

.set("spark.executor.extraClassPath","D:\\Downloads\\spark-2.2.0-bin-hadoop2.7\\spark-2.2.0-bin-hadoop2.7\\jars")
;

Where D:\Downloads\spark-2.2.0-bin-hadoop2.7 is my spark unpacked path, and I 
can find the snappy jar file snappy-0.2.jar and snappy-java-1.1.2.6.jar in

D:\Downloads\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7\jars\

However nothing works and even the error message not change.

How can I fix it?


ref of stackoverflow: 
https://stackoverflow.com/questions/47626012/config-snappy-support-for-spark-in-windows
 



Regard,
Junfeng Chen


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.