Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Gourav Sengupta
Hi Ayan,

Delta is obviously well thought through, its been available in Databricks
since last year and a half now I think and besides that it is from some of
the best minds at work :)

But what may not be well tested in Delta is its availability as a storage
class for HIVE.

How about your testing? Are you doing it in S3? What is the kind of volume
you are testing it with if I may ask.


Regards,
Gourav Sengupta

On Thu, Jun 20, 2019 at 12:58 AM ayan guha  wrote:

> Hi
>
> We are using Delta features. The only problem we faced till now is Hive
> can not read DELTA outputs by itself (even if the Hive metastore is
> shared). However, if we create hive external table pointing to the folder
> (and with Vacuum), it can read the data.
>
> Other than that, the feature looks good and well thought out. We are doing
> a volume testing now
>
> Best
> Ayan
>
> On Thu, Jun 20, 2019 at 9:52 AM Liwen Sun 
> wrote:
>
>> Hi Gourav,
>>
>> Thanks for the suggestion. Please open a Github issue at
>> https://github.com/delta-io/delta/issues to describe your use case and
>> requirements for "external tables" so we can better track this feature and
>> also get feedback from the community.
>>
>> Regards,
>> Liwen
>>
>> On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta <
>> gourav.sengu...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> does Delta support external tables? I think that most users will be
>>> needing this.
>>>
>>>
>>> Regards,
>>> Gourav
>>>
>>> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun 
>>> wrote:
>>>
 We are delighted to announce the availability of Delta Lake 0.2.0!

 To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
 https://docs.delta.io/0.2.0/quick-start.html

 To view the release notes:
 https://github.com/delta-io/delta/releases/tag/v0.2.0

 This release introduces two main features:

 *Cloud storage support*
 In addition to HDFS, you can now configure Delta Lake to read and write
 data on cloud storage services such as Amazon S3 and Azure Blob Storage.
 For configuration instructions, please see:
 https://docs.delta.io/0.2.0/delta-storage.html

 *Improved concurrency*
 Delta Lake now allows concurrent append-only writes while still
 ensuring serializability. For concurrency control in Delta Lake, please
 see: https://docs.delta.io/0.2.0/delta-concurrency.html

 We have also greatly expanded the test coverage as part of this release.

 We would like to acknowledge all community members for contributing to
 this release.

 Best regards,
 Liwen Sun

 --
 You received this message because you are subscribed to the Google
 Groups "Delta Lake Users and Developers" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to delta-users+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
 
 .

>>>
>
> --
> Best Regards,
> Ayan Guha
>


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Gourav Sengupta
Hi Liwen,

its done https://github.com/delta-io/delta/issues/73

Please let me know in case the description looks fine. I can also
contribute to the test cases in case required.


Regards,
Gourav

On Thu, Jun 20, 2019 at 12:52 AM Liwen Sun  wrote:

> Hi Gourav,
>
> Thanks for the suggestion. Please open a Github issue at
> https://github.com/delta-io/delta/issues to describe your use case and
> requirements for "external tables" so we can better track this feature and
> also get feedback from the community.
>
> Regards,
> Liwen
>
> On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi,
>>
>> does Delta support external tables? I think that most users will be
>> needing this.
>>
>>
>> Regards,
>> Gourav
>>
>> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun 
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Delta Lake Users and Developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to delta-users+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>>> 
>>> .
>>>
>>


Re: What is the compatibility between releases?

2019-06-19 Thread Yeikel
Hi Community , 

I am still looking for an answer for this question

I am running a cluster using Spark 2.3.1 , but I wondering if it is safe to
include Spark 2.4.1 and use new features such as higher order functions. 

Thank you.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Override jars in spark submit

2019-06-19 Thread Keith Chapman
Hi Naresh,

You could use "--conf spark.driver.extraClassPath=". Note
that the jar will not be shipped to the executors, if its a class that is
needed on the executors as well you should provide "--conf
spark.executor.extraClassPath=". Note that if you do
provide executor extraclasspath the jar file needs to be present on all the
executors.

Regards,
Keith.

http://keith-chapman.com


On Wed, Jun 19, 2019 at 8:57 PM naresh Goud 
wrote:

> Hello All,
>
> How can we override jars in spark submit?
> We have hive-exec-spark jar which is available as part of default spark
> cluster jars.
> We wanted to override above mentioned jar in spark submit with latest
> version jar.
> How do we do that ?
>
>
> Thank you,
> Naresh
> --
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>


connecting spark with mysql

2019-06-19 Thread ya
Hi everyone,

I tried to manipulate MySQL tables from spark, I do not want to move these 
tables from MySQL to spark, as these tables can easily get very big. It is 
ideal that the data stays in the database where it was stored. For me, spark is 
only used to speed up the read and write process (as I am more a data analyst 
rather than an application developer). So I did not install hadoop. People here 
have helped me a lot, but I still cannot connect MySQL to spark, possible 
reasons are, for instance, java version, java files location, connector files 
location, MySQL version, environment variable location, the use of jdbc or 
odbc, and so on. My questions are:

1. Do we need to install hadoop and java before installing spark?

2. Which version of each of these package are stable for successful 
installation and connection, if anyone had any possible experience? (the 
solutions online might worked on older version of these packages, but seems not 
working anymore in my case, I’m on mac by the way).

3. So far, the only way I tried successfully is to utilize the sqldf package on 
SparkR to connect MySQL, but does it mean that spark is working (to speed up 
the process) when I run the sql queries with sqldf package on SparkR? 

I hope I described my questions clearly. Thank you very much for the help.

Best regards,

YA

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark SQL

2019-06-19 Thread naresh Goud
Just to make it more clear,  Spark sql uses hive metastore and run queries
using its own engine and not uses hive execution engine.

Please correct me if it’s not true.



On Mon, Jun 10, 2019 at 2:29 PM Russell Spitzer 
wrote:

> Spark can use the HiveMetastore as a catalog, but it doesn't use the hive
> parser or optimization engine. Instead it uses Catalyst, see
> https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
>
> On Mon, Jun 10, 2019 at 2:07 PM naresh Goud 
> wrote:
>
>> Hi Team,
>>
>> Is Spark Sql uses hive engine to run queries ?
>> My understanding that spark sql uses hive meta store to get metadata
>> information to run queries.
>>
>> Thank you,
>> Naresh
>> --
>> Thanks,
>> Naresh
>> www.linkedin.com/in/naresh-dulam
>> http://hadoopandspark.blogspot.com/
>>
>> --
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


Override jars in spark submit

2019-06-19 Thread naresh Goud
Hello All,

How can we override jars in spark submit?
We have hive-exec-spark jar which is available as part of default spark
cluster jars.
We wanted to override above mentioned jar in spark submit with latest
version jar.
How do we do that ?


Thank you,
Naresh
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread ayan guha
Hi

We are using Delta features. The only problem we faced till now is Hive can
not read DELTA outputs by itself (even if the Hive metastore is shared).
However, if we create hive external table pointing to the folder (and with
Vacuum), it can read the data.

Other than that, the feature looks good and well thought out. We are doing
a volume testing now

Best
Ayan

On Thu, Jun 20, 2019 at 9:52 AM Liwen Sun  wrote:

> Hi Gourav,
>
> Thanks for the suggestion. Please open a Github issue at
> https://github.com/delta-io/delta/issues to describe your use case and
> requirements for "external tables" so we can better track this feature and
> also get feedback from the community.
>
> Regards,
> Liwen
>
> On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi,
>>
>> does Delta support external tables? I think that most users will be
>> needing this.
>>
>>
>> Regards,
>> Gourav
>>
>> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun 
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Delta Lake Users and Developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to delta-users+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>>> 
>>> .
>>>
>>

-- 
Best Regards,
Ayan Guha


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Liwen Sun
Hi Gourav,

Thanks for the suggestion. Please open a Github issue at
https://github.com/delta-io/delta/issues to describe your use case and
requirements for "external tables" so we can better track this feature and
also get feedback from the community.

Regards,
Liwen

On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta 
wrote:

> Hi,
>
> does Delta support external tables? I think that most users will be
> needing this.
>
>
> Regards,
> Gourav
>
> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun 
> wrote:
>
>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>
>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>> https://docs.delta.io/0.2.0/quick-start.html
>>
>> To view the release notes:
>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>
>> This release introduces two main features:
>>
>> *Cloud storage support*
>> In addition to HDFS, you can now configure Delta Lake to read and write
>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>> For configuration instructions, please see:
>> https://docs.delta.io/0.2.0/delta-storage.html
>>
>> *Improved concurrency*
>> Delta Lake now allows concurrent append-only writes while still ensuring
>> serializability. For concurrency control in Delta Lake, please see:
>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>
>> We have also greatly expanded the test coverage as part of this release.
>>
>> We would like to acknowledge all community members for contributing to
>> this release.
>>
>> Best regards,
>> Liwen Sun
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Delta Lake Users and Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to delta-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>> 
>> .
>>
>


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Gourav Sengupta
Hi,

does Delta support external tables? I think that most users will be needing
this.


Regards,
Gourav

On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun  wrote:

> We are delighted to announce the availability of Delta Lake 0.2.0!
>
> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
> https://docs.delta.io/0.2.0/quick-start.html
>
> To view the release notes:
> https://github.com/delta-io/delta/releases/tag/v0.2.0
>
> This release introduces two main features:
>
> *Cloud storage support*
> In addition to HDFS, you can now configure Delta Lake to read and write
> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
> For configuration instructions, please see:
> https://docs.delta.io/0.2.0/delta-storage.html
>
> *Improved concurrency*
> Delta Lake now allows concurrent append-only writes while still ensuring
> serializability. For concurrency control in Delta Lake, please see:
> https://docs.delta.io/0.2.0/delta-concurrency.html
>
> We have also greatly expanded the test coverage as part of this release.
>
> We would like to acknowledge all community members for contributing to
> this release.
>
> Best regards,
> Liwen Sun
>
> --
> You received this message because you are subscribed to the Google Groups
> "Delta Lake Users and Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to delta-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
> 
> .
>


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Gourav Sengupta
Hi,

this is fantastic :)

Regards,
Gourav Sengupta

On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun  wrote:

> We are delighted to announce the availability of Delta Lake 0.2.0!
>
> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
> https://docs.delta.io/0.2.0/quick-start.html
>
> To view the release notes:
> https://github.com/delta-io/delta/releases/tag/v0.2.0
>
> This release introduces two main features:
>
> *Cloud storage support*
> In addition to HDFS, you can now configure Delta Lake to read and write
> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
> For configuration instructions, please see:
> https://docs.delta.io/0.2.0/delta-storage.html
>
> *Improved concurrency*
> Delta Lake now allows concurrent append-only writes while still ensuring
> serializability. For concurrency control in Delta Lake, please see:
> https://docs.delta.io/0.2.0/delta-concurrency.html
>
> We have also greatly expanded the test coverage as part of this release.
>
> We would like to acknowledge all community members for contributing to
> this release.
>
> Best regards,
> Liwen Sun
>
> --
> You received this message because you are subscribed to the Google Groups
> "Delta Lake Users and Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to delta-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
> 
> .
>


Announcing Delta Lake 0.2.0

2019-06-19 Thread Liwen Sun
We are delighted to announce the availability of Delta Lake 0.2.0!

To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
https://docs.delta.io/0.2.0/quick-start.html

To view the release notes:
https://github.com/delta-io/delta/releases/tag/v0.2.0

This release introduces two main features:

*Cloud storage support*
In addition to HDFS, you can now configure Delta Lake to read and write
data on cloud storage services such as Amazon S3 and Azure Blob Storage.
For configuration instructions, please see:
https://docs.delta.io/0.2.0/delta-storage.html

*Improved concurrency*
Delta Lake now allows concurrent append-only writes while still ensuring
serializability. For concurrency control in Delta Lake, please see:
https://docs.delta.io/0.2.0/delta-concurrency.html

We have also greatly expanded the test coverage as part of this release.

We would like to acknowledge all community members for contributing to this
release.

Best regards,
Liwen Sun


pyspark cached dataframe shows deserialized at StorageLevel

2019-06-19 Thread Mitsutoshi Kiuchi

Hi,

Spark document describes "Since the data is always serialized on the  
Python side, all the constants use the serialized formats.".


http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.StorageLevel

But when I cached dataframe and looked StorageLevel, it shows that  
cached dataframe is deserialized. It looks wrong behavior against  
document. but does anyone please comment about it ?


Here is code to reproduce.

from pyspark.sql.functions import col, length
source = [
("frighten",), ("watch",), ("dish",),
("reflect",), ("serious",), ("summer",),
("embrace",), ("transition",), ("Venus",)]
dfa = spark.createDataFrame(source, ["word"]).withColumn("num",  
length(col("word")))

dfa2 = dfa.cache()

dfa.storageLevel  >>> shows "StorageLevel(True, True, False, True, 1)"
dfa2.storageLevel >>> same as above


Thanks for reading!

Mitsutoshi Kiuchi

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: tcps oracle connection from spark

2019-06-19 Thread Luca Canali
Connecting to Oracle from Spark using the TPCS protocol works OK for me.
Maybe try to turn debug on with -Djavax.net.debug=all?
See also:
https://blogs.oracle.com/dev2dev/ssl-connection-to-oracle-db-using-jdbc%2c-tlsv12%2c-jks-or-oracle-wallets

Regards,
L.

From: Richard Xin 
Sent: Wednesday, June 19, 2019 00:51
To: User 
Subject: Re: tcps oracle connection from spark

and btw, same connection string works fine when used in SQL Developer.

On Tuesday, June 18, 2019, 03:49:24 PM PDT, Richard Xin 
mailto:richardxin...@yahoo.com>> wrote:


HI, I need help with tcps oracle connection from spark (version: 
spark-2.4.0-bin-hadoop2.7)


Properties prop = new Properties();
prop.putAll(sparkOracle);  // username/password

prop.put("javax.net.ssl.trustStore", "path to root.jks");
prop.put("javax.net.ssl.trustStorePassword", "password_here");

df.write()
.mode(SaveMode.Append)
.option("driver", "oracle.jdbc.driver.OracleDriver")
.jdbc("jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcps)(HOST=host.mycomapny.com)(PORT=1234)))(CONNECT_DATA=(SERVICE_NAME=service_name)))","tableName",
 prop)
;


note "PROTOCOL=tcps" in the connection string.

The code worked fine for "tcp" hosts, but some of our servers use "tcps" only, 
I got following errors when hitting oracld tcps hosts, can someone shed some 
light? Thanks a lot!

Exception in thread "main" java.sql.SQLRecoverableException: IO Error: Remote 
host terminated the handshake
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:682)
at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:715)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:385)
at 
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:30)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:564)
at 
org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:48)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:506)
at 
com.apple.jmet.pallas.data_migration.DirectMigrationWConfig.main(DirectMigrationWConfig.java:103)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit

[webinar] TFX Chicago Taxi example on Mini Kubeflow (MiniKF)

2019-06-19 Thread Chris Pavlou

Hi all,

I would like to invite you to our webinar "Kubeflow Pipelines on-prem". 
It will take place on Friday, June 21 at 9am Pacific Time. You can 
register here:


https://zoom.us/webinar/register/WN_j_HJbkISTluMckyyr706eg

We are going to demonstrate the end-to-end TFX Chicago Taxi example 
running on-prem using MiniKF. MiniKF is a single-node deployment of 
Kubeflow that you can install on-prem or on your laptop.


https://www.kubeflow.org/docs/started/getting-started-minikf/

Looking forward to seeing you on Friday!

Best,
Chris

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Unsubscribe

2019-06-19 Thread Tushar Marne
-- 
Tushar Marne
9011062432


Re: Ask for ARM CI for spark

2019-06-19 Thread Tianhua huang
Thanks for your reply.

As I said before, I met some problem of build or test for spark on aarch64
server, so it will be better to have the ARM CI to make sure the spark
is compatible
for AArch64 platforms.

I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open
source project testing. And we can support some Arm virtual machines to
AMPLab Jenkins, and also we have a developer team that willing to work on
this, we willing to maintain build CI jobs and address the CI issues.
What do you think?


Thanks for your attention.

On Wed, Jun 19, 2019 at 6:39 AM shane knapp  wrote:

> yeah, we don't have any aarch64 systems for testing...  this has been
> asked before but is currently pretty low on our priority list as we don't
> have the hardware.
>
> sorry,
>
> shane
>
> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang 
> wrote:
>
>> Hi, sorry to disturb you.
>> The CI testing for apache spark is supported by AMPLab Jenkins, and I
>> find there are some computers(most of them are Linux (amd64) arch) for
>> the CI development, but seems there is no Aarch64 computer for spark CI
>> testing. Recently, I build and run test for spark(master and branch-2.4) on
>> my arm server, and unfortunately there are some problems, for example, ut
>> test is failed due to a LEVELDBJNI native package, the details for java
>> test see http://paste.openstack.org/show/752063/ and python test see
>> http://paste.openstack.org/show/752709/
>> So I have a question about the ARM CI testing for spark, is there any
>> plan to support it? Thank you very much and I will wait for your reply!
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>