date:20150702

[SparkSQL 1.4.0]The result of SUM(xxx) in SparkSQL is 0.0 but not null when the column xxx is all null

2015-07-02 Thread StanZhai

Hi all, 

I have a table named test like this:

|  a  |  b  |
|  1  | null |
|  2  | null |

After upgraded the cluster from spark 1.3.1 to 1.4.0, I found the Sum
function in spark 1.4 and 1.3 are different.

The SQL is: select sum(b) from test

In Spark 1.4.0 the result is 0.0, in spark 1.3.1 the result is null. I think
the result should be null, why the result is 0.0 in 1.4.0 but not null? Is
this a bug?

Any hint is appreciated.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/SparkSQL-1-4-0-The-result-of-SUM-xxx-in-SparkSQL-is-0-0-but-not-null-when-the-column-xxx-is-all-null-tp13008.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: except vs subtract

2015-07-02 Thread Reynold Xin

"except" is a keyword in Python unfortunately.



On Thu, Jul 2, 2015 at 11:54 PM, Krishna Sankar  wrote:

> Guys,
>Scala says except while python has subtract. (I verified that except
> doesn't exist in python) Why the difference in syntax for the same
> functionality ?
> Cheers
> 
>

except vs subtract

2015-07-02 Thread Krishna Sankar

Guys,
   Scala says except while python has subtract. (I verified that except
doesn't exist in python) Why the difference in syntax for the same
functionality ?
Cheers

Differential Equation Spark Solver

2015-07-02 Thread jamaica

Dear Spark Devs,

I have written an  experimental 1d laplace parallel Spark solver

 
, out of curiousity regarding  this

  
thread. 
It still may have some unknown bugs and only works for very special cases
but still, i think it shows it can be done with Spark.

When solving large scale differential equations, one may consider using
MPI/PVM based parallel machines because of their raw power and efficiency.
But then again I think Spark (in-memory) map-reduce can also be reasonably
fast and easy to use, with some tweaks.

- For example a tweak such as transferring data only between specific
clusters will significantly improve both memory efficiency and speed.
(partial broadcasting?)

But still I do not know how many people may also be interested in this and
how much contribution to Spark scene it can make.
So if you are interested building Differential Equation Spark Solver, or can
suggest where I can discuss this with potentially interested parties, or
have any other suggestions / information, please let me know!

Thank you.

Best regards,
Myunsoo



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Differential-Equation-Spark-Solver-tp13005.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Grouping runs of elements in a RDD

2015-07-02 Thread RJ Nowling

Thanks, Mohit.  It sounds like we're on the same page -- I used a similar
approach.

On Thu, Jul 2, 2015 at 12:27 PM, Mohit Jaggi  wrote:

> if you are joining successive lines together based on a predicate, then
> you are doing a "flatMap" not an "aggregate". you are on the right track
> with a multi-pass solution. i had the same challenge when i needed a
> sliding window over an RDD(see below).
>
> [ i had suggested that the sliding window API be moved to spark-core. not
> sure if that happened ]
>
> - previous posts ---
>
>
> http://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.mllib.rdd.RDDFunctions
>
> > On Fri, Jan 30, 2015 at 12:27 AM, Mohit Jaggi 
> > wrote:
> >
> >
> > http://mail-archives.apache.org/mod_mbox/spark-user/201405.mbox/%3ccalrvtpkn65rolzbetc+ddk4o+yjm+tfaf5dz8eucpl-2yhy...@mail.gmail.com%3E
> >
> > you can use the MLLib function or do the following (which is what I had
> > done):
> >
> > - in first pass over the data, using mapPartitionWithIndex, gather the
> > first item in each partition. you can use collect (or aggregator) for this.
> > “key” them by the partition index. at the end, you will have a map
> >(partition index) --> first item
> > - in the second pass over the data, using mapPartitionWithIndex again,
> > look at two (or in the general case N items at a time, you can use scala’s
> > sliding iterator) items at a time and check the time difference(or any
> > sliding window computation). To this mapParitition, pass the map created in
> > previous step. You will need to use them to check the last item in this
> > partition.
> >
> > If you can tolerate a few inaccuracies then you can just do the second
> > step. You will miss the “boundaries” of the partitions but it might be
> > acceptable for your use case.
>
>
> On Tue, Jun 30, 2015 at 12:21 PM, RJ Nowling  wrote:
>
>> That's an interesting idea!  I hadn't considered that.  However, looking
>> at the Partitioner interface, I would need to know from looking at a single
>> key which doesn't fit my case, unfortunately.  For my case, I need to
>> compare successive pairs of keys.  (I'm trying to re-join lines that were
>> split prematurely.)
>>
>> On Tue, Jun 30, 2015 at 2:07 PM, Abhishek R. Singh <
>> abhis...@tetrationanalytics.com> wrote:
>>
>>> could you use a custom partitioner to preserve boundaries such that all
>>> related tuples end up on the same partition?
>>>
>>> On Jun 30, 2015, at 12:00 PM, RJ Nowling  wrote:
>>>
>>> Thanks, Reynold.  I still need to handle incomplete groups that fall
>>> between partition boundaries. So, I need a two-pass approach. I came up
>>> with a somewhat hacky way to handle those using the partition indices and
>>> key-value pairs as a second pass after the first.
>>>
>>> OCaml's std library provides a function called group() that takes a
>>> break function that operators on pairs of successive elements.  It seems a
>>> similar approach could be used in Spark and would be more efficient than my
>>> approach with key-value pairs since you know the ordering of the partitions.
>>>
>>> Has this need been expressed by others?
>>>
>>> On Tue, Jun 30, 2015 at 1:03 PM, Reynold Xin 
>>> wrote:
>>>
 Try mapPartitions, which gives you an iterator, and you can produce an
 iterator back.


 On Tue, Jun 30, 2015 at 11:01 AM, RJ Nowling 
 wrote:

> Hi all,
>
> I have a problem where I have a RDD of elements:
>
> Item1 Item2 Item3 Item4 Item5 Item6 ...
>
> and I want to run a function over them to decide which runs of
> elements to group together:
>
> [Item1 Item2] [Item3] [Item4 Item5 Item6] ...
>
> Technically, I could use aggregate to do this, but I would have to use
> a List of List of T which would produce a very large collection in memory.
>
> Is there an easy way to accomplish this?  e.g.,, it would be nice to
> have a version of aggregate where the combination function can return a
> complete group that is added to the new RDD and an incomplete group which
> is passed to the next call of the reduce function.
>
> Thanks,
> RJ
>


>>>
>>>
>>
>

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread Andrew Or

@Sean I believe that is a real issue. I have submitted a patch to fix it:
https://github.com/apache/spark/pull/7193. Unfortunately this would mean we
need to cut a new RC to include it. When we do so we should also do another
careful pass over the commits that are merged since the first RC.

-1

2015-07-02 9:10 GMT-07:00 Shivaram Venkataraman 
:

> +1 Tested the EC2 launch scripts and the Spark version and EC2 branch etc.
> look good.
>
> Shivaram
>
> On Thu, Jul 2, 2015 at 8:22 AM, Patrick Wendell 
> wrote:
>
>> Hey Sean - yes I think that is an issue. Our published poms need to
>> have the dependency versions inlined.
>>
>> We probably need to revert that bit of the build patch.
>>
>> - Patrick
>>
>> On Thu, Jul 2, 2015 at 7:21 AM, vaquar khan 
>> wrote:
>> > +1
>> >
>> > On 2 Jul 2015 18:03, "shenyan zhen"  wrote:
>> >>
>> >> +1
>> >>
>> >> On Jun 30, 2015 8:28 PM, "Reynold Xin"  wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell > >
>> >>> wrote:
>> 
>>  Please vote on releasing the following candidate as Apache Spark
>> version
>>  1.4.1!
>> 
>>  This release fixes a handful of known issues in Spark 1.4.0, listed
>>  here:
>>  http://s.apache.org/spark-1.4.1
>> 
>>  The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
>>  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>>  60e08e50751fe3929156de956d62faea79f5b801
>> 
>>  The release files, including signatures, digests, etc. can be found
>> at:
>> 
>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>> 
>>  Release artifacts are signed with the following key:
>>  https://people.apache.org/keys/committer/pwendell.asc
>> 
>>  The staging repository for this release can be found at:
>>  [published as version: 1.4.1]
>> 
>> https://repository.apache.org/content/repositories/orgapachespark-1118/
>>  [published as version: 1.4.1-rc1]
>> 
>> https://repository.apache.org/content/repositories/orgapachespark-1119/
>> 
>>  The documentation corresponding to this release can be found at:
>> 
>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>> 
>>  Please vote on releasing this package as Apache Spark 1.4.1!
>> 
>>  The vote is open until Saturday, June 27, at 06:32 UTC and passes
>>  if a majority of at least 3 +1 PMC votes are cast.
>> 
>>  [ ] +1 Release this package as Apache Spark 1.4.1
>>  [ ] -1 Do not release this package because ...
>> 
>>  To learn more about Apache Spark, please see
>>  http://spark.apache.org/
>> 
>>  -
>>  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>  For additional commands, e-mail: dev-h...@spark.apache.org
>> 
>> >>>
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>

Re: Grouping runs of elements in a RDD

2015-07-02 Thread Mohit Jaggi

if you are joining successive lines together based on a predicate, then you
are doing a "flatMap" not an "aggregate". you are on the right track with a
multi-pass solution. i had the same challenge when i needed a sliding
window over an RDD(see below).

[ i had suggested that the sliding window API be moved to spark-core. not
sure if that happened ]

- previous posts ---

http://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.mllib.rdd.RDDFunctions

> On Fri, Jan 30, 2015 at 12:27 AM, Mohit Jaggi 
> wrote:
>
>
> http://mail-archives.apache.org/mod_mbox/spark-user/201405.mbox/%3ccalrvtpkn65rolzbetc+ddk4o+yjm+tfaf5dz8eucpl-2yhy...@mail.gmail.com%3E
>
> you can use the MLLib function or do the following (which is what I had
> done):
>
> - in first pass over the data, using mapPartitionWithIndex, gather the
> first item in each partition. you can use collect (or aggregator) for this.
> “key” them by the partition index. at the end, you will have a map
>(partition index) --> first item
> - in the second pass over the data, using mapPartitionWithIndex again,
> look at two (or in the general case N items at a time, you can use scala’s
> sliding iterator) items at a time and check the time difference(or any
> sliding window computation). To this mapParitition, pass the map created in
> previous step. You will need to use them to check the last item in this
> partition.
>
> If you can tolerate a few inaccuracies then you can just do the second
> step. You will miss the “boundaries” of the partitions but it might be
> acceptable for your use case.


On Tue, Jun 30, 2015 at 12:21 PM, RJ Nowling  wrote:

> That's an interesting idea!  I hadn't considered that.  However, looking
> at the Partitioner interface, I would need to know from looking at a single
> key which doesn't fit my case, unfortunately.  For my case, I need to
> compare successive pairs of keys.  (I'm trying to re-join lines that were
> split prematurely.)
>
> On Tue, Jun 30, 2015 at 2:07 PM, Abhishek R. Singh <
> abhis...@tetrationanalytics.com> wrote:
>
>> could you use a custom partitioner to preserve boundaries such that all
>> related tuples end up on the same partition?
>>
>> On Jun 30, 2015, at 12:00 PM, RJ Nowling  wrote:
>>
>> Thanks, Reynold.  I still need to handle incomplete groups that fall
>> between partition boundaries. So, I need a two-pass approach. I came up
>> with a somewhat hacky way to handle those using the partition indices and
>> key-value pairs as a second pass after the first.
>>
>> OCaml's std library provides a function called group() that takes a break
>> function that operators on pairs of successive elements.  It seems a
>> similar approach could be used in Spark and would be more efficient than my
>> approach with key-value pairs since you know the ordering of the partitions.
>>
>> Has this need been expressed by others?
>>
>> On Tue, Jun 30, 2015 at 1:03 PM, Reynold Xin  wrote:
>>
>>> Try mapPartitions, which gives you an iterator, and you can produce an
>>> iterator back.
>>>
>>>
>>> On Tue, Jun 30, 2015 at 11:01 AM, RJ Nowling  wrote:
>>>
 Hi all,

 I have a problem where I have a RDD of elements:

 Item1 Item2 Item3 Item4 Item5 Item6 ...

 and I want to run a function over them to decide which runs of elements
 to group together:

 [Item1 Item2] [Item3] [Item4 Item5 Item6] ...

 Technically, I could use aggregate to do this, but I would have to use
 a List of List of T which would produce a very large collection in memory.

 Is there an easy way to accomplish this?  e.g.,, it would be nice to
 have a version of aggregate where the combination function can return a
 complete group that is added to the new RDD and an incomplete group which
 is passed to the next call of the reduce function.

 Thanks,
 RJ

>>>
>>>
>>
>>
>

A proposal for Test matrix decompositions for speed/stability (SPARK-7210)

2015-07-02 Thread Chris Harvey

Hello,

I am new to the Apache Spark project but I would like to contribute to
issue SPARK-7210. There has been come conversation on that issue and I
would like to take a shot at it. Before doing so, I want to run my plan by
everyone.

>From the description and the comments, the goal is to test other methods of
computing the  MVN pdf. The stated concern is that the SVD used is slow
despite it being numerically stable, and that speed and stability may
become problematic as the number of features grow.

In the comments, Feynman posted an R recipe for computing the pdf using a
Cholesky trick. I would like to compute the pdf by following that recipe
while using the Cholesky implementation found in Scalanlp Breeze. To test
speed I would estimate the pdf using the original method and the Cholesky
method across a range of simulated datasets with growing n and p. To test
stability I would estimate the pdf on simulated features with some
multicollinearity.

Does this sound like a good starting point? Am I thinking of this correctly?

Given that this is my first attempt at contributing to an Apache project,
might it be a good idea to do this through the Mentor Programme?

Please let me know how this sounds, and I can provide some personal details
about my experience and motivations.

Thanks,

Chris

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread Shivaram Venkataraman

+1 Tested the EC2 launch scripts and the Spark version and EC2 branch etc.
look good.

Shivaram

On Thu, Jul 2, 2015 at 8:22 AM, Patrick Wendell  wrote:

> Hey Sean - yes I think that is an issue. Our published poms need to
> have the dependency versions inlined.
>
> We probably need to revert that bit of the build patch.
>
> - Patrick
>
> On Thu, Jul 2, 2015 at 7:21 AM, vaquar khan  wrote:
> > +1
> >
> > On 2 Jul 2015 18:03, "shenyan zhen"  wrote:
> >>
> >> +1
> >>
> >> On Jun 30, 2015 8:28 PM, "Reynold Xin"  wrote:
> >>>
> >>> +1
> >>>
> >>> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
> >>> wrote:
> 
>  Please vote on releasing the following candidate as Apache Spark
> version
>  1.4.1!
> 
>  This release fixes a handful of known issues in Spark 1.4.0, listed
>  here:
>  http://s.apache.org/spark-1.4.1
> 
>  The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
>  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>  60e08e50751fe3929156de956d62faea79f5b801
> 
>  The release files, including signatures, digests, etc. can be found
> at:
> 
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
> 
>  Release artifacts are signed with the following key:
>  https://people.apache.org/keys/committer/pwendell.asc
> 
>  The staging repository for this release can be found at:
>  [published as version: 1.4.1]
> 
> https://repository.apache.org/content/repositories/orgapachespark-1118/
>  [published as version: 1.4.1-rc1]
> 
> https://repository.apache.org/content/repositories/orgapachespark-1119/
> 
>  The documentation corresponding to this release can be found at:
> 
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
> 
>  Please vote on releasing this package as Apache Spark 1.4.1!
> 
>  The vote is open until Saturday, June 27, at 06:32 UTC and passes
>  if a majority of at least 3 +1 PMC votes are cast.
> 
>  [ ] +1 Release this package as Apache Spark 1.4.1
>  [ ] -1 Do not release this package because ...
> 
>  To learn more about Apache Spark, please see
>  http://spark.apache.org/
> 
>  -
>  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>  For additional commands, e-mail: dev-h...@spark.apache.org
> 
> >>>
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

[SPARK-8794] [SQL] PrunedScan problem

2015-07-02 Thread Eron Wright

I filed an issue due to an issue I see with PrunedScan, that causes sub-optimal 
performance in ML pipelines.   
Sorry if the issue is already known.
Having tried a few approaches to working with large binary files with Spark ML, 
I prefer loading the data into a vector-type column from a relation supporting 
pruned scan.  This is better, I think, than a lazy-loading scheme based on 
binaryFiles/PortalDataStream.   SPARK-8794 undermines the approach.
Eron

Size of RDD partitions

2015-07-02 Thread prateek3.14

Hello everyone,
  Are there metrics for capturing the size of RDD partitions? Would the
memory usage of an executor be a good proxy for the same?

Thanks,
--Prateek




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Size-of-RDD-partitions-tp12996.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread Patrick Wendell

Hey Sean - yes I think that is an issue. Our published poms need to
have the dependency versions inlined.

We probably need to revert that bit of the build patch.

- Patrick

On Thu, Jul 2, 2015 at 7:21 AM, vaquar khan  wrote:
> +1
>
> On 2 Jul 2015 18:03, "shenyan zhen"  wrote:
>>
>> +1
>>
>> On Jun 30, 2015 8:28 PM, "Reynold Xin"  wrote:
>>>
>>> +1
>>>
>>> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
>>> wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed
 here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

>>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread vaquar khan

+1
On 2 Jul 2015 18:03, "shenyan zhen"  wrote:

> +1
> On Jun 30, 2015 8:28 PM, "Reynold Xin"  wrote:
>
>> +1
>>
>> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.4.1!
>>>
>>> This release fixes a handful of known issues in Spark 1.4.0, listed here:
>>> http://s.apache.org/spark-1.4.1
>>>
>>> The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>>> 60e08e50751fe3929156de956d62faea79f5b801
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> [published as version: 1.4.1]
>>> https://repository.apache.org/content/repositories/orgapachespark-1118/
>>> [published as version: 1.4.1-rc1]
>>> https://repository.apache.org/content/repositories/orgapachespark-1119/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>>>
>>> Please vote on releasing this package as Apache Spark 1.4.1!
>>>
>>> The vote is open until Saturday, June 27, at 06:32 UTC and passes
>>> if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.4.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>

Re: enum-like types in Spark

2015-07-02 Thread Imran Rashid

Hi Stephen,

I'm not sure which link you are referring to for the example code -- but
yes, the recommendation is that you create the enum in Java, eg. see

https://github.com/apache/spark/blob/v1.4.0/core/src/main/java/org/apache/spark/status/api/v1/StageStatus.java

Then nothing special is required to use it in scala.  This method both uses
the overall type of the enum in the return value, and uses specific values
in the body:

https://github.com/apache/spark/blob/v1.4.0/core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala#L114

(I did delete the branches for the code that is *not* recommended anymore)

Imran


On Wed, Jul 1, 2015 at 5:53 PM, Stephen Boesch  wrote:

> I am reviving an old thread here. The link for the example code for the
> java enum based solution is now dead: would someone please post an updated
> link showing the proper interop?
>
> Specifically: it is my understanding that java enum's may not be created
> within Scala.  So is the proposed solution requiring dropping out into Java
> to create the enum's?
>
> 2015-04-09 17:16 GMT-07:00 Xiangrui Meng :
>
>> Using Java enums sound good. We can list the values in the JavaDoc and
>> hope Scala will be able to correctly generate docs for Java enums in
>> the future. -Xiangrui
>>
>> On Thu, Apr 9, 2015 at 10:59 AM, Imran Rashid 
>> wrote:
>> > any update here?  This is relevant for a currently open PR of mine --
>> I've
>> > got a bunch of new public constants defined w/ format #4, but I'd gladly
>> > switch to java enums.  (Even if we are just going to postpone this
>> decision,
>> > I'm still inclined to switch to java enums ...)
>> >
>> > just to be clear about the existing problem with enums & scaladoc: right
>> > now, the scaladoc knows about the enum class, and generates a page for
>> it,
>> > but it does not display the enum constants.  It is at least labeled as a
>> > java enum, though, so a savvy user could switch to the javadocs to see
>> the
>> > constants.
>> >
>> >
>> >
>> > On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid 
>> wrote:
>> >>
>> >> well, perhaps I overstated things a little, I wouldn't call it the
>> >> "official" solution, just a recommendation in the never-ending debate
>> (and
>> >> the recommendation from folks with their hands on scala itself).
>> >>
>> >> Even if we do get this fixed in scaladoc eventually -- as its not in
>> the
>> >> current versions, where does that leave this proposal?  personally I'd
>> >> *still* prefer java enums, even if it doesn't get into scaladoc.  btw,
>> even
>> >> with sealed traits, the scaladoc still isn't great -- you don't see the
>> >> values from the class, you only see them listed from the companion
>> object.
>> >> (though, that is somewhat standard for scaladoc, so maybe I'm reaching
>> a
>> >> little)
>> >>
>> >>
>> >>
>> >> On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell 
>> >> wrote:
>> >>>
>> >>> If the official solution from the Scala community is to use Java
>> >>> enums, then it seems strange they aren't generated in scaldoc? Maybe
>> >>> we can just fix that w/ Typesafe's help and then we can use them.
>> >>>
>> >>> On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen 
>> wrote:
>> >>> > Yeah the fully realized #4, which gets back the ability to use it in
>> >>> > switch statements (? in Scala but not Java?) does end up being kind
>> of
>> >>> > huge.
>> >>> >
>> >>> > I confess I'm swayed a bit back to Java enums, seeing what it
>> >>> > involves. The hashCode() issue can be 'solved' with the hash of the
>> >>> > String representation.
>> >>> >
>> >>> > On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid > >
>> >>> > wrote:
>> >>> >> I've just switched some of my code over to the new format, and I
>> just
>> >>> >> want
>> >>> >> to make sure everyone realizes what we are getting into.  I went
>> from
>> >>> >> 10
>> >>> >> lines as java enums
>> >>> >>
>> >>> >>
>> >>> >>
>> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
>> >>> >>
>> >>> >> to 30 lines with the new format:
>> >>> >>
>> >>> >>
>> >>> >>
>> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
>> >>> >>
>> >>> >> its not just that its verbose.  each name has to be repeated 4
>> times,
>> >>> >> with
>> >>> >> potential typos in some locations that won't be caught by the
>> >>> >> compiler.
>> >>> >> Also, you have to manually maintain the "values" as you update the
>> set
>> >>> >> of
>> >>> >> enums, the compiler won't do it for you.
>> >>> >>
>> >>> >> The only downside I've heard for java enums is enum.hashcode().
>> OTOH,
>> >>> >> the
>> >>> >> downsides for this version are: maintainability / verbosity, no
>> >>> >> values(),
>> >>> >> more cumbersome to use from java, no enum map / enumset.
>> >>> >>
>> >>> >> I did put together a little util to at least get back the
>> equivalent
>> >>> >> of
>> >>> >> enum.valueOf() wi

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread shenyan zhen

+1
On Jun 30, 2015 8:28 PM, "Reynold Xin"  wrote:

> +1
>
> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.4.1!
>>
>> This release fixes a handful of known issues in Spark 1.4.0, listed here:
>> http://s.apache.org/spark-1.4.1
>>
>> The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>> 60e08e50751fe3929156de956d62faea79f5b801
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> [published as version: 1.4.1]
>> https://repository.apache.org/content/repositories/orgapachespark-1118/
>> [published as version: 1.4.1-rc1]
>> https://repository.apache.org/content/repositories/orgapachespark-1119/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>>
>> Please vote on releasing this package as Apache Spark 1.4.1!
>>
>> The vote is open until Saturday, June 27, at 06:32 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.4.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread Sean Owen

I wanted to flag a potential blocker here, but pardon me if this is
still after all this time just my misunderstanding of the POM/build
theory --

So this is the final candiate release POM right?
https://repository.apache.org/content/repositories/orgapachespark-1118/org/apache/spark/spark-core_2.10/1.4.1/spark-core_2.10-1.4.1.pom

Compare to for example:
https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.4.0/spark-core_2.10-1.4.0.pom

and see:

https://issues.apache.org/jira/browse/SPARK-8781

For instance, in 1.4.0 it had


org.apache.spark
spark-launcher_2.10
1.4.0
compile


but now that's:


org.apache.spark
spark-launcher_${scala.binary.version}
${project.version}


JIRA suggests it had to do with adding:

false

Am I missing something or is that indeed not going to work as a release POM?

On Wed, Jun 24, 2015 at 6:37 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.4.1!
>
> This release fixes a handful of known issues in Spark 1.4.0, listed here:
> http://s.apache.org/spark-1.4.1
>
> The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
> 60e08e50751fe3929156de956d62faea79f5b801
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> [published as version: 1.4.1]
> https://repository.apache.org/content/repositories/orgapachespark-1118/
> [published as version: 1.4.1-rc1]
> https://repository.apache.org/content/repositories/orgapachespark-1119/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>
> Please vote on releasing this package as Apache Spark 1.4.1!
>
> The vote is open until Saturday, June 27, at 06:32 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.4.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[SparkSQL 1.4.0]The result of SUM(xxx) in SparkSQL is 0.0 but not null when the column xxx is all null

Re: except vs subtract

except vs subtract

Differential Equation Spark Solver

Re: Grouping runs of elements in a RDD

Re: [VOTE] Release Apache Spark 1.4.1

Re: Grouping runs of elements in a RDD

A proposal for Test matrix decompositions for speed/stability (SPARK-7210)

Re: [VOTE] Release Apache Spark 1.4.1

[SPARK-8794] [SQL] PrunedScan problem

Size of RDD partitions

Re: [VOTE] Release Apache Spark 1.4.1

Re: [VOTE] Release Apache Spark 1.4.1

Re: enum-like types in Spark

Re: [VOTE] Release Apache Spark 1.4.1

Re: [VOTE] Release Apache Spark 1.4.1

16 matches

Site Navigation

Mail list logo

Footer information