Re: Time for 2.3.2?

2018-06-29 Thread John Zhuge
+1  Looking forward to the critical fixes in 2.3.2.

On Thu, Jun 28, 2018 at 9:37 AM Ryan Blue  wrote:

> +1
>
> On Thu, Jun 28, 2018 at 9:34 AM Xiao Li  wrote:
>
>> +1. Thanks, Saisai!
>>
>> The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP.
>>
>> Thanks,
>>
>> Xiao
>>
>> 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro :
>>
>>> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
>>>
>>> On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang 
>>> wrote:
>>>
 +1

 Wenchen Fan 于2018年6月28日 周四下午2:06写道:

> Hi Saisai, that's great! please go ahead!
>
> On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao 
> wrote:
>
>> +1, like mentioned by Marcelo, these issues seems quite severe.
>>
>> I can work on the release if short of hands :).
>>
>> Thanks
>> Jerry
>>
>>
>> Marcelo Vanzin  于2018年6月28日周四 上午11:40写道:
>>
>>> +1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
>>> for those out.
>>>
>>> (Those are what delayed 2.2.2 and 2.1.3 for those watching...)
>>>
>>> On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan 
>>> wrote:
>>> > Hi all,
>>> >
>>> > Spark 2.3.1 was released just a while ago, but unfortunately we
>>> discovered
>>> > and fixed some critical issues afterward.
>>> >
>>> > SPARK-24495: SortMergeJoin may produce wrong result.
>>> > This is a serious correctness bug, and is easy to hit: have
>>> duplicated join
>>> > key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`,
>>> and the
>>> > join is a sort merge join. This bug is only present in Spark 2.3.
>>> >
>>> > SPARK-24588: stream-stream join may produce wrong result
>>> > This is a correctness bug in a new feature of Spark 2.3: the
>>> stream-stream
>>> > join. Users can hit this bug if one of the join side is
>>> partitioned by a
>>> > subset of the join keys.
>>> >
>>> > SPARK-24552: Task attempt numbers are reused when stages are
>>> retried
>>> > This is a long-standing bug in the output committer that may
>>> introduce data
>>> > corruption.
>>> >
>>> > SPARK-24542: UDFXPath allow users to pass carefully crafted
>>> XML to
>>> > access arbitrary files
>>> > This is a potential security issue if users build access control
>>> module upon
>>> > Spark.
>>> >
>>> > I think we need a Spark 2.3.2 to address these issues(especially
>>> the
>>> > correctness bugs) ASAP. Any thoughts?
>>> >
>>> > Thanks,
>>> > Wenchen
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
> --
> John Zhuge
>


[RESULT] [VOTE] Spark 2.1.3 (RC2)

2018-06-29 Thread Marcelo Vanzin
The vote passes. Thanks to all who helped with the release!

I'll start publishing everything today, and an announcement will
be sent when artifacts have propagated to the mirrors (probably
early next week).

+1 (* = binding):
- Marcelo Vanzin *
- Sean Owen *
- Felix Cheung *
- Tom Graves *

+0: None

-1: None


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Jenkins build errors

2018-06-29 Thread petar . zecevic


The problem was with the changes upstream. fetch upstream and a rebase resolved 
it and now the build is passing.

I also added a design doc and made the JIRA description a bit clearer 
(https://issues.apache.org/jira/browse/SPARK-24020) so I hope it will get 
merged soon.

Thanks,
Petar


Sean Owen @ 1970-01-01 01:00 CET:

> Also confused about this one as many builds succeed. One possible difference 
> is that this failure is in the Hive tests, so are you building and testing 
> with -Phive locally where it works? still does not explain the download 
> failure. It could be a mirror
> problem, throttling, etc. But there again haven't spotted another failing 
> Hive test.
>
> On Wed, Jun 20, 2018 at 1:55 AM Petar Zecevic  wrote:
>
>  It's still dying. Back to this error (it used to be spark-2.2.0 before):
>
> java.io.IOException: Cannot run program "./bin/spark-submit" (in directory 
> "/tmp/test-spark/spark-2.1.2"): error=2, No such file or directory
>  So, a mirror is missing that Spark version... I don't understand why nobody 
> else has these errors and I get them every time without fail.
>
>  Petar


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [SPARK-24579] SPIP: Standardize Optimized Data Exchange between Spark and DL/AI frameworks

2018-06-29 Thread Li Jin
Hi Xiangrui,

Thanks for sending this out. I have left some comments on the google doc:
https://docs.google.com/document/d/1dFOFV3LOu6deSNd8Ndp87-wsdxQqA9cnkuj35jUlrmQ/edit#heading=h.84jotgsrp6bj

Look forward to your response.

Li

On Mon, Jun 18, 2018 at 11:33 AM, Xiangrui Meng  wrote:

> Hi all,
>
> I posted a new SPIP on optimized data exchange between Spark and DL/AI
> frameworks at SPARK-24579
> . It took inputs from
> offline conversations with several Spark committers and contributors at
> Spark+AI summit conference. Please take a look and let me know your
> thoughts in JIRA comments. Thanks!
>
> Best,
> Xiangrui
> --
>
> Xiangrui Meng
>
> Software Engineer
>
> Databricks Inc. [image: http://databricks.com] 
>


Re: Time for 2.3.2?

2018-06-29 Thread Yu, Yucai
+1. We are evaluating 2.3.1, please release Spark 2.3.2 ASAP.

Thanks,
Yucai


Re: Time for 2.3.2?

2018-06-29 Thread gvramana
+1. Need to release Spark 2.3.2 ASAP

Thanks,
Venkata Ramana Gollamudi



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-29 Thread Marco Gaido
Yes, I'd say so.

2018-06-29 4:43 GMT+02:00 吴晓菊 :

> And it should be generic for HashJoin not only broadcast join, right?
>
>
> Chrysan Wu
> 吴晓菊
> Phone:+86 17717640807
>
>
> 2018-06-29 10:42 GMT+08:00 吴晓菊 :
>
>> Sorry for the mistake. You are right output ordering of broadcast join
>> can be the order of big table in some types of join. I will prepare a PR
>> and let you review later. Thanks a lot!
>>
>>
>> Chrysan Wu
>> 吴晓菊
>> Phone:+86 17717640807
>>
>>
>> 2018-06-29 0:00 GMT+08:00 Wenchen Fan :
>>
>>> SortMergeJoin sorts its children by join key, but broadcast join does
>>> not. I think the output ordering of broadcast join has nothing to do with
>>> join key.
>>>
>>> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido 
>>> wrote:
>>>
 I think the outputOrdering would be the one of the big table (if any)
 and it wouldn't matter if this involves the join keys or not. Am I wrong?

 2018-06-28 17:01 GMT+02:00 吴晓菊 :

> Thanks for the reply.
> By looking into the SortMergeJoinExec, I think we can follow what
> SortMergeJoin do, for some types of join, if the children is ordered on
> join keys, we can output the ordered join keys as output ordering.
>
>
> Chrysan Wu
> 吴晓菊
> Phone:+86 17717640807
>
>
> 2018-06-28 22:53 GMT+08:00 Wenchen Fan :
>
>> SortMergeJoin only reports ordering of the join keys, not the output
>> ordering of any child.
>>
>> It seems reasonable to me that broadcast join should respect the
>> output ordering of the children. Feel free to submit a PR to fix it, 
>> thanks!
>>
>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊  wrote:
>>
>>> Why we cannot use the output order of big table?
>>>
>>>
>>> Chrysan Wu
>>> Phone:+86 17717640807
>>>
>>>
>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido :
>>>
 The easy answer to this is that SortMergeJoin ensure an
 outputOrdering, while BroadcastHashJoin doesn't, ie. after running a
 BroadcastHashJoin you don't know which is going to be the order of the
 output since nothing enforces it.

 Hope this helps.
 Thanks.
 Marco

 2018-06-28 15:46 GMT+02:00 吴晓菊 :

>
> We see SortMergeJoinExec is implemented with
> outputPartitioning while BroadcastHashJoinExec is
> only implemented with outputPartitioning. Why is the design?
>
> Chrysan Wu
> Phone:+86 17717640807
>
>

>>>
>

>>
>