[VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-19 Thread Reynold Xin
Please vote on releasing the following candidate as Apache Spark version
2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5
(13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

This release candidate resolves ~2500 issues:
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new
features will not necessarily block this release. Note that historically
Spark documentation has been published on the website separately from the
main release so we do not need to block the release due to documentation
errors either.


HiveContext , difficulties in accessing tables in hive schema's/database's other than default database.

2016-07-19 Thread satyajit vegesna
Hi All,

I have been trying to access tables from other schema's , apart from
default , to pull data into dataframe.

i was successful in doing it using the default schema in hive database.
But when i try any other schema/database in hive, i am getting below
error.(Have also not seen any examples related to accessing tables in other
schema/Database apart from default).

16/07/19 18:16:06 INFO hive.metastore: Connected to metastore.
16/07/19 18:16:08 INFO storage.MemoryStore: Block broadcast_0 stored as
values in memory (estimated size 472.3 KB, free 472.3 KB)
16/07/19 18:16:08 INFO storage.MemoryStore: Block broadcast_0_piece0 stored
as bytes in memory (estimated size 39.6 KB, free 511.9 KB)
16/07/19 18:16:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
in memory on localhost:41434 (size: 39.6 KB, free: 2.4 GB)
16/07/19 18:16:08 INFO spark.SparkContext: Created broadcast 0 from show at
sparkHive.scala:70
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.hive.ql.exec.Utilities.copyTableJobPropertiesToConf(Lorg/apache/hadoop/hive/ql/plan/TableDesc;Lorg/apache/hadoop/mapred/JobConf;)V
at
org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:324)
at
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
at
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190)
at
org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
at
org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
at
org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1538)
at
org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1538)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2125)
at org.apache.spark.sql.DataFrame.org
$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1537)
at org.apache.spark.sql.DataFrame.org
$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1544)
at
org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1414)
at
org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1413)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2138)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413)
at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495)
at 

Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-19 Thread Holden Karau
Ah in that case: 0


On Tue, Jul 19, 2016 at 3:26 PM, Jonathan Kelly 
wrote:

> The docs link from Reynold's initial email is apparently no longer valid.
> He posted an updated link a little later in this same thread.
>
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs-updated/
>
> On Tue, Jul 19, 2016 at 3:19 PM Holden Karau  wrote:
>
>> -1 : The docs don't seem to be fully built (e.g.
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs/streaming-programming-guide.html
>> is a zero byte file currently) - although if this is a transient apache
>> issue no worries.
>>
>> On Thu, Jul 14, 2016 at 11:59 AM, Reynold Xin 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.0.0. The vote is open until Sunday, July 17, 2016 at 12:00 PDT and passes
>>> if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.0.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> The tag to be voted on is v2.0.0-rc4
>>> (e5f8c1117e0c48499f54d62b556bc693435afae0).
>>>
>>> This release candidate resolves ~2500 issues:
>>> https://s.apache.org/spark-2.0.0-jira
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> *https://repository.apache.org/content/repositories/orgapachespark-1192/
>>> *
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs/
>>>
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions from 1.x.
>>>
>>> ==
>>> What justifies a -1 vote for this release?
>>> ==
>>> Critical bugs impacting major functionalities.
>>>
>>> Bugs already present in 1.x, missing features, or bugs related to new
>>> features will not necessarily block this release. Note that historically
>>> Spark documentation has been published on the website separately from the
>>> main release so we do not need to block the release due to documentation
>>> errors either.
>>>
>>>
>>> Note: There was a mistake made during "rc3" preparation, and as a result
>>> there is no "rc3", but only "rc4".
>>>
>>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-19 Thread Jonathan Kelly
The docs link from Reynold's initial email is apparently no longer valid.
He posted an updated link a little later in this same thread.

http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs-updated/

On Tue, Jul 19, 2016 at 3:19 PM Holden Karau  wrote:

> -1 : The docs don't seem to be fully built (e.g.
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs/streaming-programming-guide.html
> is a zero byte file currently) - although if this is a transient apache
> issue no worries.
>
> On Thu, Jul 14, 2016 at 11:59 AM, Reynold Xin  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.0.0. The vote is open until Sunday, July 17, 2016 at 12:00 PDT and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.0.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> The tag to be voted on is v2.0.0-rc4
>> (e5f8c1117e0c48499f54d62b556bc693435afae0).
>>
>> This release candidate resolves ~2500 issues:
>> https://s.apache.org/spark-2.0.0-jira
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> *https://repository.apache.org/content/repositories/orgapachespark-1192/
>> *
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs/
>>
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions from 1.x.
>>
>> ==
>> What justifies a -1 vote for this release?
>> ==
>> Critical bugs impacting major functionalities.
>>
>> Bugs already present in 1.x, missing features, or bugs related to new
>> features will not necessarily block this release. Note that historically
>> Spark documentation has been published on the website separately from the
>> main release so we do not need to block the release due to documentation
>> errors either.
>>
>>
>> Note: There was a mistake made during "rc3" preparation, and as a result
>> there is no "rc3", but only "rc4".
>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


Missing Exector Logs From Yarn After Spark Failure

2016-07-19 Thread Rachana Srivastava
I am trying to find the root cause of recent Spark application failure in 
production. When the Spark application is running I can check NodeManager's 
yarn.nodemanager.log-dir property to get the Spark executor container logs.

The container has logs for both the running Spark applications

Here is the view of the container logs: drwx--x--- 3 yarn yarn 51 Jul 19 09:04 
application_1467068598418_0209 drwx--x--- 5 yarn yarn 141 Jul 19 09:04 
application_1467068598418_0210

But when the application is killed both the application logs are automatically 
deleted. I have set all the log retention setting etc in Yarn to a very large 
number. But still these logs are deleted as soon as the Spark applications are 
crashed.

Question: How can we retain these Spark application logs in Yarn for debugging 
when the Spark application is crashed for some reason.


Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-19 Thread Holden Karau
-1 : The docs don't seem to be fully built (e.g.
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs/streaming-programming-guide.html
is a zero byte file currently) - although if this is a transient apache
issue no worries.

On Thu, Jul 14, 2016 at 11:59 AM, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Sunday, July 17, 2016 at 12:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc4
> (e5f8c1117e0c48499f54d62b556bc693435afae0).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> *https://repository.apache.org/content/repositories/orgapachespark-1192/
> *
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
> Note: There was a mistake made during "rc3" preparation, and as a result
> there is no "rc3", but only "rc4".
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-19 Thread Marcelo Vanzin
+0

Our internal test suites seem mostly happy, except for SPARK-16632.
Since there's a somewhat easy workaround, I don't think it's a blocker
for 2.0.0.

On Thu, Jul 14, 2016 at 11:59 AM, Reynold Xin  wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Sunday, July 17, 2016 at 12:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc4
> (e5f8c1117e0c48499f54d62b556bc693435afae0).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1192/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
> Note: There was a mistake made during "rc3" preparation, and as a result
> there is no "rc3", but only "rc4".
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: transtition SQLContext to SparkSession

2016-07-19 Thread Michael Allman
Hi Reynold,

So far we've been able to transition everything to `SparkSession`. I was just 
following up on behalf of Maciej.

Michael

> On Jul 19, 2016, at 11:02 AM, Reynold Xin  wrote:
> 
> dropping user list
> 
> Yup I just took a look -- you are right.
> 
> What's the reason you'd need a HiveContext? The only method that HiveContext 
> has and SQLContext does not have is refreshTable. Given this is meant for 
> helping code transition, it might be easier to just use SQLContext and change 
> the places that use refreshTable?
> 
> In order for SparkSession.sqlContext to return an actual HiveContext, we'd 
> need to use reflection to create a HiveContext, which is pretty hacky.
> 
> 
> 
> On Tue, Jul 19, 2016 at 10:58 AM, Michael Allman  > wrote:
> Sorry Reynold, I want to triple check this with you. I'm looking at the 
> `SparkSession.sqlContext` field in the latest 2.0 branch, and it appears that 
> that val is set specifically to an instance of the `SQLContext` class. A cast 
> to `HiveContext` will fail. Maybe there's a misunderstanding here. This is 
> what I'm looking at:
> 
> https://github.com/apache/spark/blob/24ea875198ffcef4a4c3ba28aba128d6d7d9a395/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L122
>  
> 
> 
> Michael
> 
> 
> 
>> On Jul 19, 2016, at 10:01 AM, Reynold Xin > > wrote:
>> 
>> Yes. But in order to access methods available only in HiveContext a user 
>> cast is required. 
>> 
>> On Tuesday, July 19, 2016, Maciej Bryński > > wrote:
>> @Reynold Xin,
>> How this will work with Hive Support ?
>> SparkSession.sqlContext return HiveContext ?
>> 
>> 2016-07-19 0:26 GMT+02:00 Reynold Xin >:
>> > Good idea.
>> >
>> > https://github.com/apache/spark/pull/14252 
>> > 
>> >
>> >
>> >
>> > On Mon, Jul 18, 2016 at 12:16 PM, Michael Armbrust > > <>>
>> > wrote:
>> >>
>> >> + dev, reynold
>> >>
>> >> Yeah, thats a good point.  I wonder if SparkSession.sqlContext should be
>> >> public/deprecated?
>> >>
>> >> On Mon, Jul 18, 2016 at 8:37 AM, Koert Kuipers > 
>> >> wrote:
>> >>>
>> >>> in my codebase i would like to gradually transition to SparkSession, so
>> >>> while i start using SparkSession i also want a SQLContext to be 
>> >>> available as
>> >>> before (but with a deprecated warning when i use it). this should be easy
>> >>> since SQLContext is now a wrapper for SparkSession.
>> >>>
>> >>> so basically:
>> >>> val session = SparkSession.builder.set(..., ...).getOrCreate()
>> >>> val sqlc = new SQLContext(session)
>> >>>
>> >>> however this doesnt work, the SQLContext constructor i am trying to use
>> >>> is private. SparkSession.sqlContext is also private.
>> >>>
>> >>> am i missing something?
>> >>>
>> >>> a non-gradual switch is not very realistic in any significant codebase,
>> >>> and i do not want to create SparkSession and SQLContext independendly 
>> >>> (both
>> >>> from same SparkContext) since that can only lead to confusion and
>> >>> inconsistent settings.
>> >>
>> >>
>> >
>> 
>> 
>> 
>> --
>> Maciek Bryński
> 
> 



Re: Build changes after SPARK-13579

2016-07-19 Thread Michael Gummelt
This line: "build/sbt clean assembly"

should also be changed, right?

On Tue, Jul 19, 2016 at 1:18 AM, Sean Owen  wrote:

> If the change is just to replace "sbt assembly/assembly" with "sbt
> package", done. LMK if there are more edits.
>
> On Mon, Jul 18, 2016 at 10:00 PM, Michael Gummelt
>  wrote:
> > I just flailed on this a bit before finding this email.  Can someone
> please
> > update
> >
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
> >
> > On Mon, Apr 4, 2016 at 10:01 PM, Reynold Xin 
> wrote:
> >>
> >> pyspark and R
> >>
> >> On Mon, Apr 4, 2016 at 9:59 PM, Marcelo Vanzin 
> >> wrote:
> >>>
> >>> No, tests (except pyspark) should work without having to package
> anything
> >>> first.
> >>>
> >>> On Mon, Apr 4, 2016 at 9:58 PM, Koert Kuipers 
> wrote:
> >>> > do i need to run sbt package before doing tests?
> >>> >
> >>> > On Mon, Apr 4, 2016 at 11:00 PM, Marcelo Vanzin  >
> >>> > wrote:
> >>> >>
> >>> >> Hey all,
> >>> >>
> >>> >> We merged  SPARK-13579 today, and if you're like me and have your
> >>> >> hands automatically type "sbt assembly" anytime you're building
> Spark,
> >>> >> that won't work anymore.
> >>> >>
> >>> >> You should now use "sbt package"; you'll still need "sbt assembly"
> if
> >>> >> you require one of the remaining assemblies (streaming connectors,
> >>> >> yarn shuffle service).
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Marcelo
> >>> >>
> >>> >>
> -
> >>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>> >>
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Marcelo
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: dev-h...@spark.apache.org
> >>>
> >>
> >
> >
> >
> > --
> > Michael Gummelt
> > Software Engineer
> > Mesosphere
>



-- 
Michael Gummelt
Software Engineer
Mesosphere


Re: transtition SQLContext to SparkSession

2016-07-19 Thread Reynold Xin
dropping user list

Yup I just took a look -- you are right.

What's the reason you'd need a HiveContext? The only method that
HiveContext has and SQLContext does not have is refreshTable. Given this is
meant for helping code transition, it might be easier to just use
SQLContext and change the places that use refreshTable?

In order for SparkSession.sqlContext to return an actual HiveContext, we'd
need to use reflection to create a HiveContext, which is pretty hacky.



On Tue, Jul 19, 2016 at 10:58 AM, Michael Allman 
wrote:

> Sorry Reynold, I want to triple check this with you. I'm looking at the
> `SparkSession.sqlContext` field in the latest 2.0 branch, and it appears
> that that val is set specifically to an instance of the `SQLContext` class.
> A cast to `HiveContext` will fail. Maybe there's a misunderstanding here.
> This is what I'm looking at:
>
>
> https://github.com/apache/spark/blob/24ea875198ffcef4a4c3ba28aba128d6d7d9a395/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L122
>
> Michael
>
>
>
> On Jul 19, 2016, at 10:01 AM, Reynold Xin  wrote:
>
> Yes. But in order to access methods available only in HiveContext a user
> cast is required.
>
> On Tuesday, July 19, 2016, Maciej Bryński  wrote:
>
>> @Reynold Xin,
>> How this will work with Hive Support ?
>> SparkSession.sqlContext return HiveContext ?
>>
>> 2016-07-19 0:26 GMT+02:00 Reynold Xin :
>> > Good idea.
>> >
>> > https://github.com/apache/spark/pull/14252
>> >
>> >
>> >
>> > On Mon, Jul 18, 2016 at 12:16 PM, Michael Armbrust <
>> mich...@databricks.com>
>> > wrote:
>> >>
>> >> + dev, reynold
>> >>
>> >> Yeah, thats a good point.  I wonder if SparkSession.sqlContext should
>> be
>> >> public/deprecated?
>> >>
>> >> On Mon, Jul 18, 2016 at 8:37 AM, Koert Kuipers 
>> wrote:
>> >>>
>> >>> in my codebase i would like to gradually transition to SparkSession,
>> so
>> >>> while i start using SparkSession i also want a SQLContext to be
>> available as
>> >>> before (but with a deprecated warning when i use it). this should be
>> easy
>> >>> since SQLContext is now a wrapper for SparkSession.
>> >>>
>> >>> so basically:
>> >>> val session = SparkSession.builder.set(..., ...).getOrCreate()
>> >>> val sqlc = new SQLContext(session)
>> >>>
>> >>> however this doesnt work, the SQLContext constructor i am trying to
>> use
>> >>> is private. SparkSession.sqlContext is also private.
>> >>>
>> >>> am i missing something?
>> >>>
>> >>> a non-gradual switch is not very realistic in any significant
>> codebase,
>> >>> and i do not want to create SparkSession and SQLContext independendly
>> (both
>> >>> from same SparkContext) since that can only lead to confusion and
>> >>> inconsistent settings.
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Maciek Bryński
>>
>
>


Re: transtition SQLContext to SparkSession

2016-07-19 Thread Michael Allman
Sorry Reynold, I want to triple check this with you. I'm looking at the 
`SparkSession.sqlContext` field in the latest 2.0 branch, and it appears that 
that val is set specifically to an instance of the `SQLContext` class. A cast 
to `HiveContext` will fail. Maybe there's a misunderstanding here. This is what 
I'm looking at:

https://github.com/apache/spark/blob/24ea875198ffcef4a4c3ba28aba128d6d7d9a395/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L122

Michael


> On Jul 19, 2016, at 10:01 AM, Reynold Xin  wrote:
> 
> Yes. But in order to access methods available only in HiveContext a user cast 
> is required. 
> 
> On Tuesday, July 19, 2016, Maciej Bryński  > wrote:
> @Reynold Xin,
> How this will work with Hive Support ?
> SparkSession.sqlContext return HiveContext ?
> 
> 2016-07-19 0:26 GMT+02:00 Reynold Xin >:
> > Good idea.
> >
> > https://github.com/apache/spark/pull/14252 
> > 
> >
> >
> >
> > On Mon, Jul 18, 2016 at 12:16 PM, Michael Armbrust  > >
> > wrote:
> >>
> >> + dev, reynold
> >>
> >> Yeah, thats a good point.  I wonder if SparkSession.sqlContext should be
> >> public/deprecated?
> >>
> >> On Mon, Jul 18, 2016 at 8:37 AM, Koert Kuipers  >> > wrote:
> >>>
> >>> in my codebase i would like to gradually transition to SparkSession, so
> >>> while i start using SparkSession i also want a SQLContext to be available 
> >>> as
> >>> before (but with a deprecated warning when i use it). this should be easy
> >>> since SQLContext is now a wrapper for SparkSession.
> >>>
> >>> so basically:
> >>> val session = SparkSession.builder.set(..., ...).getOrCreate()
> >>> val sqlc = new SQLContext(session)
> >>>
> >>> however this doesnt work, the SQLContext constructor i am trying to use
> >>> is private. SparkSession.sqlContext is also private.
> >>>
> >>> am i missing something?
> >>>
> >>> a non-gradual switch is not very realistic in any significant codebase,
> >>> and i do not want to create SparkSession and SQLContext independendly 
> >>> (both
> >>> from same SparkContext) since that can only lead to confusion and
> >>> inconsistent settings.
> >>
> >>
> >
> 
> 
> 
> --
> Maciek Bryński



Re: transtition SQLContext to SparkSession

2016-07-19 Thread Reynold Xin
Yes. But in order to access methods available only in HiveContext a user
cast is required.

On Tuesday, July 19, 2016, Maciej Bryński  wrote:

> @Reynold Xin,
> How this will work with Hive Support ?
> SparkSession.sqlContext return HiveContext ?
>
> 2016-07-19 0:26 GMT+02:00 Reynold Xin 
> >:
> > Good idea.
> >
> > https://github.com/apache/spark/pull/14252
> >
> >
> >
> > On Mon, Jul 18, 2016 at 12:16 PM, Michael Armbrust <
> mich...@databricks.com >
> > wrote:
> >>
> >> + dev, reynold
> >>
> >> Yeah, thats a good point.  I wonder if SparkSession.sqlContext should be
> >> public/deprecated?
> >>
> >> On Mon, Jul 18, 2016 at 8:37 AM, Koert Kuipers  > wrote:
> >>>
> >>> in my codebase i would like to gradually transition to SparkSession, so
> >>> while i start using SparkSession i also want a SQLContext to be
> available as
> >>> before (but with a deprecated warning when i use it). this should be
> easy
> >>> since SQLContext is now a wrapper for SparkSession.
> >>>
> >>> so basically:
> >>> val session = SparkSession.builder.set(..., ...).getOrCreate()
> >>> val sqlc = new SQLContext(session)
> >>>
> >>> however this doesnt work, the SQLContext constructor i am trying to use
> >>> is private. SparkSession.sqlContext is also private.
> >>>
> >>> am i missing something?
> >>>
> >>> a non-gradual switch is not very realistic in any significant codebase,
> >>> and i do not want to create SparkSession and SQLContext independendly
> (both
> >>> from same SparkContext) since that can only lead to confusion and
> >>> inconsistent settings.
> >>
> >>
> >
>
>
>
> --
> Maciek Bryński
>


Re: transtition SQLContext to SparkSession

2016-07-19 Thread Maciej Bryński
@Reynold Xin,
How this will work with Hive Support ?
SparkSession.sqlContext return HiveContext ?

2016-07-19 0:26 GMT+02:00 Reynold Xin :
> Good idea.
>
> https://github.com/apache/spark/pull/14252
>
>
>
> On Mon, Jul 18, 2016 at 12:16 PM, Michael Armbrust 
> wrote:
>>
>> + dev, reynold
>>
>> Yeah, thats a good point.  I wonder if SparkSession.sqlContext should be
>> public/deprecated?
>>
>> On Mon, Jul 18, 2016 at 8:37 AM, Koert Kuipers  wrote:
>>>
>>> in my codebase i would like to gradually transition to SparkSession, so
>>> while i start using SparkSession i also want a SQLContext to be available as
>>> before (but with a deprecated warning when i use it). this should be easy
>>> since SQLContext is now a wrapper for SparkSession.
>>>
>>> so basically:
>>> val session = SparkSession.builder.set(..., ...).getOrCreate()
>>> val sqlc = new SQLContext(session)
>>>
>>> however this doesnt work, the SQLContext constructor i am trying to use
>>> is private. SparkSession.sqlContext is also private.
>>>
>>> am i missing something?
>>>
>>> a non-gradual switch is not very realistic in any significant codebase,
>>> and i do not want to create SparkSession and SQLContext independendly (both
>>> from same SparkContext) since that can only lead to confusion and
>>> inconsistent settings.
>>
>>
>



-- 
Maciek Bryński

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-19 Thread nseggert
Can I point out this guy? https://issues.apache.org/jira/browse/SPARK-15705

I managed to find a workaround, but this is still IMO a pretty significant
bug.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-2-0-0-RC4-tp18317p18352.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.0.0 preview docs uploaded

2016-07-19 Thread Pete Robbins
Are there any 'work in progress' release notes for 2.0.0 yet? I don't see
anything in the rc docs like "what's new" or "migration guide"?

On Thu, 9 Jun 2016 at 10:06 Sean Owen  wrote:

> Available but mostly as JIRA output:
> https://spark.apache.org/news/spark-2.0.0-preview.html
>
> On Thu, Jun 9, 2016 at 7:33 AM, Pete Robbins  wrote:
> > It would be nice to have a "what's new in 2.0.0" equivalent to
> > https://spark.apache.org/releases/spark-release-1-6-0.html available or
> am I
> > just missing it?
> >
> > On Wed, 8 Jun 2016 at 13:15 Sean Owen  wrote:
> >>
> >> OK, this is done:
> >>
> >> http://spark.apache.org/documentation.html
> >> http://spark.apache.org/docs/2.0.0-preview/
> >> http://spark.apache.org/docs/preview/
> >>
> >> On Tue, Jun 7, 2016 at 4:59 PM, Shivaram Venkataraman
> >>  wrote:
> >> > As far as I know the process is just to copy docs/_site from the build
> >> > to the appropriate location in the SVN repo (i.e.
> >> > site/docs/2.0.0-preview).
> >> >
> >> > Thanks
> >> > Shivaram
> >> >
> >> > On Tue, Jun 7, 2016 at 8:14 AM, Sean Owen  wrote:
> >> >> As a stop-gap, I can edit that page to have a small section about
> >> >> preview releases and point to the nightly docs.
> >> >>
> >> >> Not sure who has the power to push 2.0.0-preview to site/docs, but,
> if
> >> >> that's done then we can symlink "preview" in that dir to it and be
> >> >> done, and update this section about preview docs accordingly.
> >> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
>


Re: Build changes after SPARK-13579

2016-07-19 Thread Sean Owen
If the change is just to replace "sbt assembly/assembly" with "sbt
package", done. LMK if there are more edits.

On Mon, Jul 18, 2016 at 10:00 PM, Michael Gummelt
 wrote:
> I just flailed on this a bit before finding this email.  Can someone please
> update
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
>
> On Mon, Apr 4, 2016 at 10:01 PM, Reynold Xin  wrote:
>>
>> pyspark and R
>>
>> On Mon, Apr 4, 2016 at 9:59 PM, Marcelo Vanzin 
>> wrote:
>>>
>>> No, tests (except pyspark) should work without having to package anything
>>> first.
>>>
>>> On Mon, Apr 4, 2016 at 9:58 PM, Koert Kuipers  wrote:
>>> > do i need to run sbt package before doing tests?
>>> >
>>> > On Mon, Apr 4, 2016 at 11:00 PM, Marcelo Vanzin 
>>> > wrote:
>>> >>
>>> >> Hey all,
>>> >>
>>> >> We merged  SPARK-13579 today, and if you're like me and have your
>>> >> hands automatically type "sbt assembly" anytime you're building Spark,
>>> >> that won't work anymore.
>>> >>
>>> >> You should now use "sbt package"; you'll still need "sbt assembly" if
>>> >> you require one of the remaining assemblies (streaming connectors,
>>> >> yarn shuffle service).
>>> >>
>>> >>
>>> >> --
>>> >> Marcelo
>>> >>
>>> >> -
>>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> >> For additional commands, e-mail: dev-h...@spark.apache.org
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>
>
>
>
> --
> Michael Gummelt
> Software Engineer
> Mesosphere

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-19 Thread Sean Owen
I think unfortunately at least this one is gonna block:
https://issues.apache.org/jira/browse/SPARK-16620

Good news is that just about anything else that's at all a blocker has
been resolved and there are only about 6 issues of any kind at all
targeted for 2.0. It seems very close.

On Thu, Jul 14, 2016 at 7:59 PM, Reynold Xin  wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Sunday, July 17, 2016 at 12:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc4
> (e5f8c1117e0c48499f54d62b556bc693435afae0).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1192/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc4-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
> Note: There was a mistake made during "rc3" preparation, and as a result
> there is no "rc3", but only "rc4".
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org