[VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-05 Thread Reynold Xin
Please vote on releasing the following candidate as Apache Spark version
2.0.0. The vote is open until Friday, July 8, 2016 at 23:00 PDT and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc2
(4a55b2326c8cf50f772907a8b73fd5e7b3d1aa06).

This release candidate resolves ~2500 issues:
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc2-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1189/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc2-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new
features will not necessarily block this release. Note that historically
Spark documentation has been published on the website separately from the
main release so we do not need to block the release due to documentation
errors either.


Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Cody Koeninger
I don't think that's a scala compiler bug.

println is a valid expression that returns unit.

Unit is not a single-argument function, and does not match any of the
overloads of foreachPartition

You may be used to a conversion taking place when println is passed to
method expecting a function, but that's not a safe thing to do
silently for multiple overloads.

tldr;

just use

ds.foreachPartition(x => println(x))

you don't need any type annotations


On Tue, Jul 5, 2016 at 2:53 PM, Jacek Laskowski  wrote:
> Hi Reynold,
>
> Is this already reported and tracked somewhere. I'm quite sure that
> people will be asking about the reasons Spark does this. Where are
> such issues reported usually?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jul 5, 2016 at 6:19 PM, Reynold Xin  wrote:
>> This seems like a Scala compiler bug.
>>
>>
>> On Tuesday, July 5, 2016, Jacek Laskowski  wrote:
>>>
>>> Well, there is foreach for Java and another foreach for Scala. That's
>>> what I can understand. But while supporting two language-specific APIs
>>> -- Scala and Java -- Dataset API lost support for such simple calls
>>> without type annotations so you have to be explicit about the variant
>>> (since I'm using Scala I want to use Scala API right). It appears that
>>> any single-argument-function operators in Datasets are affected :(
>>>
>>> My question was to know whether there are works to fix it (if possible
>>> -- I don't know if it is).
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen  wrote:
>>> > Right, should have noticed that in your second mail. But foreach
>>> > already does what you want, right? it would be identical here.
>>> >
>>> > How these two methods do conceptually different things on different
>>> > arguments. I don't think I'd expect them to accept the same functions.
>>> >
>>> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski  wrote:
>>> >> ds is Dataset and the problem is that println (or any other
>>> >> one-element function) would not work here (and perhaps other methods
>>> >> with two variants - Java's and Scala's).
>>> >>
>>> >> Pozdrawiam,
>>> >> Jacek Laskowski
>>> >> 
>>> >> https://medium.com/@jaceklaskowski/
>>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> >> Follow me at https://twitter.com/jaceklaskowski
>>> >>
>>> >>
>>> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen  wrote:
>>> >>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>>> >>> expect to express an operation on a DStream as if it were elements.
>>> >>>
>>> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski 
>>> >>> wrote:
>>>  Sort of. Your example works, but could you do a mere
>>>  ds.foreachPartition(println)? Why not? What should I even see the
>>>  Java
>>>  version?
>>> 
>>>  scala> val ds = spark.range(10)
>>>  ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>> 
>>>  scala> ds.foreachPartition(println)
>>>  :26: error: overloaded method value foreachPartition with
>>>  alternatives:
>>>    (func:
>>>  org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>>  
>>>    (f: Iterator[Long] => Unit)Unit
>>>   cannot be applied to (Unit)
>>> ds.foreachPartition(println)
>>>    ^
>>> 
>>>  Pozdrawiam,
>>>  Jacek Laskowski
>>>  
>>>  https://medium.com/@jaceklaskowski/
>>>  Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>>  Follow me at https://twitter.com/jaceklaskowski
>>> 
>>> 
>>>  On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen  wrote:
>>> > Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>> >
>>> > On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski 
>>> > wrote:
>>> >> Hi,
>>> >>
>>> >> It's with the master built today. Why can't I call
>>> >> ds.foreachPartition(println)? Is using type annotation the only way
>>> >> to
>>> >> go forward? I'd be so sad if that's the case.
>>> >>
>>> >> scala> ds.foreachPartition(println)
>>> >> :28: error: overloaded method value foreachPartition with
>>> >> alternatives:
>>> >>   (func:
>>> >> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>>> >> 
>>> >>   (f: Iterator[Record] => Unit)Unit
>>> >>  cannot be applied to (Unit)
>>> >>ds.foreachPartition(println)
>>> >>   ^
>>> >>
>>> >> scala> sc.version
>>> >> 

Re: spark git commit: [SPARK-15204][SQL] improve nullability inference for Aggregator

2016-07-05 Thread Reynold Xin
Jacek,

This is definitely not necessary, but I wouldn't waste cycles "fixing"
things like this when they have virtually zero impact. Perhaps next time we
update this code we can "fix" it.

Also can you comment on the pull request directly?


On Tue, Jul 5, 2016 at 1:07 PM, Jacek Laskowski  wrote:

> On Mon, Jul 4, 2016 at 6:14 AM,   wrote:
> > Repository: spark
> > Updated Branches:
> >   refs/heads/master 88134e736 -> 8cdb81fa8
> >
> >
> > [SPARK-15204][SQL] improve nullability inference for Aggregator
> >
> > ## What changes were proposed in this pull request?
> >
> > TypedAggregateExpression sets nullable based on the schema of the
> outputEncoder
> >
> > ## How was this patch tested?
> >
> > Add test in DatasetAggregatorSuite
> >
> > Author: Koert Kuipers 
> ...
> > +assert(ds1.select(typed.sum((i: Int) => i)).schema.head.nullable
> === false)
> > +val ds2 = Seq(AggData(1, "a"), AggData(2, "a")).toDS()
> > +assert(ds2.select(SeqAgg.toColumn).schema.head.nullable === true)
> > +val ds3 = sql("SELECT 'Some String' AS b, 1279869254 AS
> a").as[AggData]
> > +assert(ds3.select(NameAgg.toColumn).schema.head.nullable === true)
>
> Why do we assert predicates? If it's true, it's true already (no need
> to compare whether it's true or not). I'd vote to "fix" it.
>
> Jacek
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: spark git commit: [SPARK-15204][SQL] improve nullability inference for Aggregator

2016-07-05 Thread Koert Kuipers
oh you mean instead of:
assert(ds3.select(NameAgg.toColumn).schema.head.nullable === true)
just do:
assert(ds3.select(NameAgg.toColumn).schema.head.nullable)

i did mostly === true because i also had === false, and i liked the
symmetry, but sure this can be fixed if its not the norm

On Tue, Jul 5, 2016 at 4:07 PM, Jacek Laskowski  wrote:

> On Mon, Jul 4, 2016 at 6:14 AM,   wrote:
> > Repository: spark
> > Updated Branches:
> >   refs/heads/master 88134e736 -> 8cdb81fa8
> >
> >
> > [SPARK-15204][SQL] improve nullability inference for Aggregator
> >
> > ## What changes were proposed in this pull request?
> >
> > TypedAggregateExpression sets nullable based on the schema of the
> outputEncoder
> >
> > ## How was this patch tested?
> >
> > Add test in DatasetAggregatorSuite
> >
> > Author: Koert Kuipers 
> ...
> > +assert(ds1.select(typed.sum((i: Int) => i)).schema.head.nullable
> === false)
> > +val ds2 = Seq(AggData(1, "a"), AggData(2, "a")).toDS()
> > +assert(ds2.select(SeqAgg.toColumn).schema.head.nullable === true)
> > +val ds3 = sql("SELECT 'Some String' AS b, 1279869254 AS
> a").as[AggData]
> > +assert(ds3.select(NameAgg.toColumn).schema.head.nullable === true)
>
> Why do we assert predicates? If it's true, it's true already (no need
> to compare whether it's true or not). I'd vote to "fix" it.
>
> Jacek
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: spark git commit: [SPARK-15204][SQL] improve nullability inference for Aggregator

2016-07-05 Thread Jacek Laskowski
On Mon, Jul 4, 2016 at 6:14 AM,   wrote:
> Repository: spark
> Updated Branches:
>   refs/heads/master 88134e736 -> 8cdb81fa8
>
>
> [SPARK-15204][SQL] improve nullability inference for Aggregator
>
> ## What changes were proposed in this pull request?
>
> TypedAggregateExpression sets nullable based on the schema of the 
> outputEncoder
>
> ## How was this patch tested?
>
> Add test in DatasetAggregatorSuite
>
> Author: Koert Kuipers 
...
> +assert(ds1.select(typed.sum((i: Int) => i)).schema.head.nullable === 
> false)
> +val ds2 = Seq(AggData(1, "a"), AggData(2, "a")).toDS()
> +assert(ds2.select(SeqAgg.toColumn).schema.head.nullable === true)
> +val ds3 = sql("SELECT 'Some String' AS b, 1279869254 AS a").as[AggData]
> +assert(ds3.select(NameAgg.toColumn).schema.head.nullable === true)

Why do we assert predicates? If it's true, it's true already (no need
to compare whether it's true or not). I'd vote to "fix" it.

Jacek

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Hi Reynold,

Is this already reported and tracked somewhere. I'm quite sure that
people will be asking about the reasons Spark does this. Where are
such issues reported usually?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jul 5, 2016 at 6:19 PM, Reynold Xin  wrote:
> This seems like a Scala compiler bug.
>
>
> On Tuesday, July 5, 2016, Jacek Laskowski  wrote:
>>
>> Well, there is foreach for Java and another foreach for Scala. That's
>> what I can understand. But while supporting two language-specific APIs
>> -- Scala and Java -- Dataset API lost support for such simple calls
>> without type annotations so you have to be explicit about the variant
>> (since I'm using Scala I want to use Scala API right). It appears that
>> any single-argument-function operators in Datasets are affected :(
>>
>> My question was to know whether there are works to fix it (if possible
>> -- I don't know if it is).
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen  wrote:
>> > Right, should have noticed that in your second mail. But foreach
>> > already does what you want, right? it would be identical here.
>> >
>> > How these two methods do conceptually different things on different
>> > arguments. I don't think I'd expect them to accept the same functions.
>> >
>> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski  wrote:
>> >> ds is Dataset and the problem is that println (or any other
>> >> one-element function) would not work here (and perhaps other methods
>> >> with two variants - Java's and Scala's).
>> >>
>> >> Pozdrawiam,
>> >> Jacek Laskowski
>> >> 
>> >> https://medium.com/@jaceklaskowski/
>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >> Follow me at https://twitter.com/jaceklaskowski
>> >>
>> >>
>> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen  wrote:
>> >>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>> >>> expect to express an operation on a DStream as if it were elements.
>> >>>
>> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski 
>> >>> wrote:
>>  Sort of. Your example works, but could you do a mere
>>  ds.foreachPartition(println)? Why not? What should I even see the
>>  Java
>>  version?
>> 
>>  scala> val ds = spark.range(10)
>>  ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>> 
>>  scala> ds.foreachPartition(println)
>>  :26: error: overloaded method value foreachPartition with
>>  alternatives:
>>    (func:
>>  org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>  
>>    (f: Iterator[Long] => Unit)Unit
>>   cannot be applied to (Unit)
>> ds.foreachPartition(println)
>>    ^
>> 
>>  Pozdrawiam,
>>  Jacek Laskowski
>>  
>>  https://medium.com/@jaceklaskowski/
>>  Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>  Follow me at https://twitter.com/jaceklaskowski
>> 
>> 
>>  On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen  wrote:
>> > Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>> >
>> > On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski 
>> > wrote:
>> >> Hi,
>> >>
>> >> It's with the master built today. Why can't I call
>> >> ds.foreachPartition(println)? Is using type annotation the only way
>> >> to
>> >> go forward? I'd be so sad if that's the case.
>> >>
>> >> scala> ds.foreachPartition(println)
>> >> :28: error: overloaded method value foreachPartition with
>> >> alternatives:
>> >>   (func:
>> >> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>> >> 
>> >>   (f: Iterator[Record] => Unit)Unit
>> >>  cannot be applied to (Unit)
>> >>ds.foreachPartition(println)
>> >>   ^
>> >>
>> >> scala> sc.version
>> >> res9: String = 2.0.0-SNAPSHOT
>> >>
>> >> Pozdrawiam,
>> >> Jacek Laskowski
>> >> 
>> >> https://medium.com/@jaceklaskowski/
>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >> Follow me at https://twitter.com/jaceklaskowski
>> >>
>> >>
>> >> -
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-07-05 Thread Reynold Xin
Please consider this vote canceled and I will work on another RC soon.

On Tue, Jun 21, 2016 at 6:26 PM, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
> if a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc1
> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>
> This release candidate resolves ~2400 issues:
> https://s.apache.org/spark-2.0.0-rc1-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1187/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
>


Re: Call to new JObject sometimes returns an empty R environment

2016-07-05 Thread Shivaram Venkataraman
-sparkr-dev@googlegroups +dev@spark.apache.org

[Please send SparkR development questions to the Spark user / dev
mailing lists. Replies inline]

> From:  
> Date: Tue, Jul 5, 2016 at 3:30 AM
> Subject: Call to new JObject sometimes returns an empty R environment
> To: SparkR Developers 
>
>
>
>  Hi all,
>
>  I have recently moved from SparkR 1.5.2 to 1.6.0. I am doing some
> experiments using SparkR:::newJObject("java.util.HashMap") and I
> notice the behaviour has changed, and it now returns an "environment"
> instead of a "jobj":
>
>> print(class(SparkR:::newJObject("java.util.HashMap")))  # SparkR 1.5.2
> [1] "jobj"
>
>> print(class(SparkR:::newJObject("java.util.HashMap")))  # SparkR 1.6.0
> [1] "environment"
>
> Moreover, the environment returned is apparently empty (when I call
> ls() on the resulting environment, it returns character(0)) . This
> problem only happens with some Java classes. I am not able to say
> exactly which classes cause the problem.

The reason this is different in Spark 1.6 is that we added support for
automatically deserializing Maps returned from the JVM as environments
on the R side. The pull request
https://github.com/apache/spark/pull/8711 has some more details. The
reason BitSet / ArrayList "work" is that we don't do any special
serialization / de-serialization for them.

>
> If I try to create an instance of other classes such as
> java.util.BitSet, it works successfully. I thought it might be related
> with parameterized types, but it does work successfully with ArrayList
> and with HashSet, which take a parameter.
>
> Any suggestions on this change of behaviour (apart from "do not use
> private functions" :-)   ) ?

Unfortunately there isn't much more to say than that. The
serialization/de-serialization is an internal API and we don't claim
to maintain backwards compatibility. You might be able to work around
this particular issue by wrapping your Map in a different object.

Thanks
Shivaram

>
> Thank you very much
>
> --
> You received this message because you are subscribed to the Google
> Groups "SparkR Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sparkr-dev+unsubscr...@googlegroups.com.
> To post to this group, send email to sparkr-...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sparkr-dev/14dbc4ce-2579-4008-96ae-818d8a94a4a7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Reynold Xin
This seems like a Scala compiler bug.

On Tuesday, July 5, 2016, Jacek Laskowski  wrote:

> Well, there is foreach for Java and another foreach for Scala. That's
> what I can understand. But while supporting two language-specific APIs
> -- Scala and Java -- Dataset API lost support for such simple calls
> without type annotations so you have to be explicit about the variant
> (since I'm using Scala I want to use Scala API right). It appears that
> any single-argument-function operators in Datasets are affected :(
>
> My question was to know whether there are works to fix it (if possible
> -- I don't know if it is).
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen  > wrote:
> > Right, should have noticed that in your second mail. But foreach
> > already does what you want, right? it would be identical here.
> >
> > How these two methods do conceptually different things on different
> > arguments. I don't think I'd expect them to accept the same functions.
> >
> > On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski  > wrote:
> >> ds is Dataset and the problem is that println (or any other
> >> one-element function) would not work here (and perhaps other methods
> >> with two variants - Java's and Scala's).
> >>
> >> Pozdrawiam,
> >> Jacek Laskowski
> >> 
> >> https://medium.com/@jaceklaskowski/
> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >> Follow me at https://twitter.com/jaceklaskowski
> >>
> >>
> >> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen  > wrote:
> >>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
> >>> expect to express an operation on a DStream as if it were elements.
> >>>
> >>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski  > wrote:
>  Sort of. Your example works, but could you do a mere
>  ds.foreachPartition(println)? Why not? What should I even see the Java
>  version?
> 
>  scala> val ds = spark.range(10)
>  ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> 
>  scala> ds.foreachPartition(println)
>  :26: error: overloaded method value foreachPartition with
> alternatives:
>    (func:
> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>  
>    (f: Iterator[Long] => Unit)Unit
>   cannot be applied to (Unit)
> ds.foreachPartition(println)
>    ^
> 
>  Pozdrawiam,
>  Jacek Laskowski
>  
>  https://medium.com/@jaceklaskowski/
>  Mastering Apache Spark http://bit.ly/mastering-apache-spark
>  Follow me at https://twitter.com/jaceklaskowski
> 
> 
>  On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen  > wrote:
> > Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
> >
> > On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski  > wrote:
> >> Hi,
> >>
> >> It's with the master built today. Why can't I call
> >> ds.foreachPartition(println)? Is using type annotation the only way
> to
> >> go forward? I'd be so sad if that's the case.
> >>
> >> scala> ds.foreachPartition(println)
> >> :28: error: overloaded method value foreachPartition with
> alternatives:
> >>   (func:
> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
> >> 
> >>   (f: Iterator[Record] => Unit)Unit
> >>  cannot be applied to (Unit)
> >>ds.foreachPartition(println)
> >>   ^
> >>
> >> scala> sc.version
> >> res9: String = 2.0.0-SNAPSHOT
> >>
> >> Pozdrawiam,
> >> Jacek Laskowski
> >> 
> >> https://medium.com/@jaceklaskowski/
> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> >> Follow me at https://twitter.com/jaceklaskowski
> >>
> >>
> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
>
>


Re: SparkSession replace SQLContext

2016-07-05 Thread Michael Allman
These topics have been included in the documentation for recent builds of Spark 
2.0.

Michael

> On Jul 5, 2016, at 3:49 AM, Romi Kuntsman  wrote:
> 
> You can also claim that there's a whole section of "Migrating from 1.6 to 
> 2.0" missing there:
> https://spark.apache.org/docs/2.0.0-preview/sql-programming-guide.html#migration-guide
>  
> 
> 
> Romi Kuntsman, Big Data Engineer
> http://www.totango.com 
> 
> On Tue, Jul 5, 2016 at 12:24 PM, nihed mbarek  > wrote:
> Hi,
> 
> I just discover that that SparkSession will replace SQLContext for spark 2.0
> JavaDoc is clear 
> https://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/sql/SparkSession.html
>  
> 
> but there is no mention in sql programming guide
> https://spark.apache.org/docs/2.0.0-preview/sql-programming-guide.html#starting-point-sqlcontext
>  
> 
> 
> Is it possible to update documentation before the release ?
> 
> 
> Thank you
> 
> -- 
> 
> MBAREK Med Nihed,
> Fedora Ambassador, TUNISIA, Northern Africa
> http://www.nihed.com 
> 
>  
> 
> 



Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Well, there is foreach for Java and another foreach for Scala. That's
what I can understand. But while supporting two language-specific APIs
-- Scala and Java -- Dataset API lost support for such simple calls
without type annotations so you have to be explicit about the variant
(since I'm using Scala I want to use Scala API right). It appears that
any single-argument-function operators in Datasets are affected :(

My question was to know whether there are works to fix it (if possible
-- I don't know if it is).

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jul 5, 2016 at 4:21 PM, Sean Owen  wrote:
> Right, should have noticed that in your second mail. But foreach
> already does what you want, right? it would be identical here.
>
> How these two methods do conceptually different things on different
> arguments. I don't think I'd expect them to accept the same functions.
>
> On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski  wrote:
>> ds is Dataset and the problem is that println (or any other
>> one-element function) would not work here (and perhaps other methods
>> with two variants - Java's and Scala's).
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen  wrote:
>>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>>> expect to express an operation on a DStream as if it were elements.
>>>
>>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski  wrote:
 Sort of. Your example works, but could you do a mere
 ds.foreachPartition(println)? Why not? What should I even see the Java
 version?

 scala> val ds = spark.range(10)
 ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]

 scala> ds.foreachPartition(println)
 :26: error: overloaded method value foreachPartition with 
 alternatives:
   (func: 
 org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
 
   (f: Iterator[Long] => Unit)Unit
  cannot be applied to (Unit)
ds.foreachPartition(println)
   ^

 Pozdrawiam,
 Jacek Laskowski
 
 https://medium.com/@jaceklaskowski/
 Mastering Apache Spark http://bit.ly/mastering-apache-spark
 Follow me at https://twitter.com/jaceklaskowski


 On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen  wrote:
> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>
> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski  wrote:
>> Hi,
>>
>> It's with the master built today. Why can't I call
>> ds.foreachPartition(println)? Is using type annotation the only way to
>> go forward? I'd be so sad if that's the case.
>>
>> scala> ds.foreachPartition(println)
>> :28: error: overloaded method value foreachPartition with 
>> alternatives:
>>   (func: 
>> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
>> 
>>   (f: Iterator[Record] => Unit)Unit
>>  cannot be applied to (Unit)
>>ds.foreachPartition(println)
>>   ^
>>
>> scala> sc.version
>> res9: String = 2.0.0-SNAPSHOT
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Sean Owen
Right, should have noticed that in your second mail. But foreach
already does what you want, right? it would be identical here.

How these two methods do conceptually different things on different
arguments. I don't think I'd expect them to accept the same functions.

On Tue, Jul 5, 2016 at 3:18 PM, Jacek Laskowski  wrote:
> ds is Dataset and the problem is that println (or any other
> one-element function) would not work here (and perhaps other methods
> with two variants - Java's and Scala's).
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen  wrote:
>> A DStream is a sequence of RDDs, not of elements. I don't think I'd
>> expect to express an operation on a DStream as if it were elements.
>>
>> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski  wrote:
>>> Sort of. Your example works, but could you do a mere
>>> ds.foreachPartition(println)? Why not? What should I even see the Java
>>> version?
>>>
>>> scala> val ds = spark.range(10)
>>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>>
>>> scala> ds.foreachPartition(println)
>>> :26: error: overloaded method value foreachPartition with 
>>> alternatives:
>>>   (func: 
>>> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>>> 
>>>   (f: Iterator[Long] => Unit)Unit
>>>  cannot be applied to (Unit)
>>>ds.foreachPartition(println)
>>>   ^
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen  wrote:
 Do you not mean ds.foreachPartition(_.foreach(println)) or similar?

 On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski  wrote:
> Hi,
>
> It's with the master built today. Why can't I call
> ds.foreachPartition(println)? Is using type annotation the only way to
> go forward? I'd be so sad if that's the case.
>
> scala> ds.foreachPartition(println)
> :28: error: overloaded method value foreachPartition with 
> alternatives:
>   (func: 
> org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
> 
>   (f: Iterator[Record] => Unit)Unit
>  cannot be applied to (Unit)
>ds.foreachPartition(println)
>   ^
>
> scala> sc.version
> res9: String = 2.0.0-SNAPSHOT
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
ds is Dataset and the problem is that println (or any other
one-element function) would not work here (and perhaps other methods
with two variants - Java's and Scala's).

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jul 5, 2016 at 3:53 PM, Sean Owen  wrote:
> A DStream is a sequence of RDDs, not of elements. I don't think I'd
> expect to express an operation on a DStream as if it were elements.
>
> On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski  wrote:
>> Sort of. Your example works, but could you do a mere
>> ds.foreachPartition(println)? Why not? What should I even see the Java
>> version?
>>
>> scala> val ds = spark.range(10)
>> ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
>>
>> scala> ds.foreachPartition(println)
>> :26: error: overloaded method value foreachPartition with 
>> alternatives:
>>   (func: 
>> org.apache.spark.api.java.function.ForeachPartitionFunction[Long])Unit
>> 
>>   (f: Iterator[Long] => Unit)Unit
>>  cannot be applied to (Unit)
>>ds.foreachPartition(println)
>>   ^
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jul 5, 2016 at 3:32 PM, Sean Owen  wrote:
>>> Do you not mean ds.foreachPartition(_.foreach(println)) or similar?
>>>
>>> On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski  wrote:
 Hi,

 It's with the master built today. Why can't I call
 ds.foreachPartition(println)? Is using type annotation the only way to
 go forward? I'd be so sad if that's the case.

 scala> ds.foreachPartition(println)
 :28: error: overloaded method value foreachPartition with 
 alternatives:
   (func: 
 org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit
 
   (f: Iterator[Record] => Unit)Unit
  cannot be applied to (Unit)
ds.foreachPartition(println)
   ^

 scala> sc.version
 res9: String = 2.0.0-SNAPSHOT

 Pozdrawiam,
 Jacek Laskowski
 
 https://medium.com/@jaceklaskowski/
 Mastering Apache Spark http://bit.ly/mastering-apache-spark
 Follow me at https://twitter.com/jaceklaskowski

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Hi,

It's with the master built today. Why can't I call
ds.foreachPartition(println)? Is using type annotation the only way to
go forward? I'd be so sad if that's the case.

scala> ds.foreachPartition(println)
:28: error: overloaded method value foreachPartition with alternatives:
  (func: 
org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit

  (f: Iterator[Record] => Unit)Unit
 cannot be applied to (Unit)
   ds.foreachPartition(println)
  ^

scala> sc.version
res9: String = 2.0.0-SNAPSHOT

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: SparkSession replace SQLContext

2016-07-05 Thread Romi Kuntsman
You can also claim that there's a whole section of "Migrating from 1.6 to
2.0" missing there:
https://spark.apache.org/docs/2.0.0-preview/sql-programming-guide.html#migration-guide

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

On Tue, Jul 5, 2016 at 12:24 PM, nihed mbarek  wrote:

> Hi,
>
> I just discover that that SparkSession will replace SQLContext for spark
> 2.0
> JavaDoc is clear
> https://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/sql/SparkSession.html
> but there is no mention in sql programming guide
>
> https://spark.apache.org/docs/2.0.0-preview/sql-programming-guide.html#starting-point-sqlcontext
>
> Is it possible to update documentation before the release ?
>
>
> Thank you
>
> --
>
> MBAREK Med Nihed,
> Fedora Ambassador, TUNISIA, Northern Africa
> http://www.nihed.com
>
> 
>
>


SparkSession replace SQLContext

2016-07-05 Thread nihed mbarek
Hi,

I just discover that that SparkSession will replace SQLContext for spark
2.0
JavaDoc is clear
https://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/sql/SparkSession.html
but there is no mention in sql programming guide
https://spark.apache.org/docs/2.0.0-preview/sql-programming-guide.html#starting-point-sqlcontext

Is it possible to update documentation before the release ?


Thank you

-- 

MBAREK Med Nihed,
Fedora Ambassador, TUNISIA, Northern Africa
http://www.nihed.com