Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread Nick Pentreath
-1 for me as we elevated https://issues.apache.org/jira/browse/SPARK-23377 to
a Blocker. It should be fixed before release.

On Thu, 15 Feb 2018 at 07:25 Holden Karau  wrote:

> If this is a blocker in your view then the vote thread is an important
> place to mention it. I'm not super sure all of the places these methods are
> used so I'll defer to srowen and folks, but for the ML related implications
> in the past we've allowed people to set the hashing function when we've
> introduced changes.
>
> On Feb 15, 2018 2:08 PM, "mrkm4ntr"  wrote:
>
>> I was advised to post here in the discussion at GitHub. I do not know
>> what to
>> do about the problem that discussions dispersing in two places.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread Holden Karau
If this is a blocker in your view then the vote thread is an important
place to mention it. I'm not super sure all of the places these methods are
used so I'll defer to srowen and folks, but for the ML related implications
in the past we've allowed people to set the hashing function when we've
introduced changes.

On Feb 15, 2018 2:08 PM, "mrkm4ntr"  wrote:

> I was advised to post here in the discussion at GitHub. I do not know what
> to
> do about the problem that discussions dispersing in two places.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread mrkm4ntr
I was advised to post here in the discussion at GitHub. I do not know what to
do about the problem that discussions dispersing in two places.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread Holden Karau
So it's currently tagged as minor and under consideration for 2.4.0. Do you
think this priority is incorrect? This doesn't seem like a regression or a
correctness issue so normally we wouldn't hold the release. Of course your
free to vote how you choose, just providing some additional context around
how tend to do released.

On Feb 14, 2018 11:03 PM, "mrkm4ntr"  wrote:

I'm -1 because of this issue.
I want to fix the hashing implementation in FeatureHasher before
FeatureHasher released in 2.3.0.

https://issues.apache.org/jira/browse/SPARK-23381
https://github.com/apache/spark/pull/20568

I will fix it soon.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Re: Why python cluster mode is not supported in standalone cluster?

2018-02-14 Thread Ashwin Sai Shankar
+dev mailing list(since i didn't get a response from user DL)

On Tue, Feb 13, 2018 at 12:20 PM, Ashwin Sai Shankar 
wrote:

> Hi Spark users!
> I noticed that spark doesn't allow python apps to run in cluster mode in
> spark standalone cluster. Does anyone know the reason? I checked jira but
> couldn't find anything relevant.
>
> Thanks,
> Ashwin
>


Re: A new external catalog

2018-02-14 Thread Tayyebi, Ameen
Newbie question:

I want to add system/integration tests for the new functionality. There are a 
set of existing tests around Spark Catalog that I can leverage. Great. The 
provider I’m writing is backed by a web service though which is part of an AWS 
account. I can write the tests using a mocked client that somehow clones the 
behavior of the webservice, but I’ll get the most value if I actually run the 
tests against a real AWS Glue account.

How do you guys deal with external dependencies for system tests? Is there an 
AWS account that is used for this purpose by any chance?

Thanks,
-Ameen

From: Steve Loughran 
Date: Tuesday, February 13, 2018 at 5:01 PM
To: "Tayyebi, Ameen" 
Cc: Apache Spark Dev 
Subject: Re: A new external catalog




On 13 Feb 2018, at 21:20, Tayyebi, Ameen 
> wrote:

Yes, I’m thinking about upgrading to these:
1.9.0

1.11.272

From:

1.7.3

1.11.76

272 is the earliest that has Glue.

How about I let the build system run the tests and if things start breaking I 
fall back to shading Glue’s specific SDK?


FWIW, some of the other troublespots are not functional, they're log overflow

https://issues.apache.org/jira/browse/HADOOP-15040
https://issues.apache.org/jira/browse/HADOOP-14596

Myself and Cloudera collaborators are testing the shaded 1.11.271 JAR & will go 
with that into Hadoop 3.1 if we're happy, but that's not so much for new 
features but "stack traces throughout the log", which seems to be a recurrent 
issue with the JARs, and one which often slips by CI build runs. If it wasn't 
for that, we'd have stuck with 1.11.199 because it didn't have any issues that 
we hadn't already got under control 
(https://github.com/aws/aws-sdk-java/issues/1211)

Like I said: upgrades bring fear


From: Steve Loughran >
Date: Tuesday, February 13, 2018 at 3:34 PM
To: "Tayyebi, Ameen" >
Cc: Apache Spark Dev >
Subject: Re: A new external catalog





On 13 Feb 2018, at 19:50, Tayyebi, Ameen 
> wrote:


The biggest challenge is that I had to upgrade the AWS SDK to a newer version 
so that it includes the Glue client since Glue is a new service. So far, I 
haven’t see any jar hell issues, but that’s the main drawback I can see. I’ve 
made sure the version is in sync with the Kinesis client used by 
spark-streaming module.

Funnily enough, I'm currently updating the s3a troubleshooting doc, the latest 
version up front saying

"Whatever problem you have, changing the AWS SDK version will not fix things, 
only change the stack traces you see."

https://github.com/steveloughran/hadoop/blob/s3/HADOOP-15076-trouble-and-perf/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md

Upgrading AWS SDKs is, sadly, often viewed with almost the same fear as guava, 
especially if it's the unshaded version which forces in a version of jackson.

Which SDK version are you proposing? 1.11.x ?




Apache EU Roadshow CFP Closing Soon (23 February)

2018-02-14 Thread Sharan F

Hello Everyone

This is an initial reminder to let you all know that we are holding an 
Apache EU Roadshow co-located with FOSS Backstage in Berlin on 13^th and 
14^th June 2018. https://s.apache.org/tCHx


The Call for Proposals (CFP) for the Apache EU Roadshow is currently 
open and will close at the end of next week, so if you have been 
delaying making a submission because the closing date seemed a long way 
off, then it's time to start getting your proposals submitted.


So what are we looking for?
We will have 2 Apache Devrooms available during the 2 day Roadshow so 
are looking for projects including incubating ones, to submit 
presentations, panel discussions, BoFs, or workshop proposals. The main 
focus of the Roadshow will be IoT, Cloud, Httpd and Tomcat so if your 
project is involved in or around any of these technologies at Apache 
then we are very interested in hearing from you.


Community and collaboration is important at Apache so if your project is 
interested in organising a project sprint, meetup or hackathon during 
the Roadshow, then please submit it inthe CFP as we do have some space 
available to allocate for these.


If you are wanting to submit a talk on open source community related 
topics such as the Apache Way, governance or legal aspects then please 
submit these to the CFP for FOSS Backstage.


Tickets for the Apache EU Roadshow are included as part of the 
registration for FOSS Backstage, so to attend the Roadshow you will need 
to register for FOSS Backstage. Early Bird tickets are still available 
until the 21^st February 2018.


Please see below for important URLs to remember:

-  To submit a CFP for the Apache EU Roadshow 
:http://apachecon.com/euroadshow18/ 


-  To submit a CFP for FOSS Backstage : 
https://foss-backstage.de/call-papers


-  To register to attend the Apache EU Roadshow and/or FOSS Backstage : 
https://foss-backstage.de/tickets


For further updates and information about the Apache EU Roadshowplease 
check http://apachecon.com/euroadshow18/


Thanks
Sharan Foga, VP Apache Community Development


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread mrkm4ntr
I'm -1 because of this issue.
I want to fix the hashing implementation in FeatureHasher before
FeatureHasher released in 2.3.0.

https://issues.apache.org/jira/browse/SPARK-23381
https://github.com/apache/spark/pull/20568

I will fix it soon.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: A new external catalog

2018-02-14 Thread Tayyebi, Ameen
Thanks a lot Steve. I’ll go through the Jira’s you linked in detail. I took a 
quick look and am sufficiently scared for now. I had run into that warning from 
the S3 stream before. Sigh.

From: Steve Loughran 
Date: Tuesday, February 13, 2018 at 5:01 PM
To: "Tayyebi, Ameen" 
Cc: Apache Spark Dev 
Subject: Re: A new external catalog




On 13 Feb 2018, at 21:20, Tayyebi, Ameen 
> wrote:

Yes, I’m thinking about upgrading to these:
1.9.0

1.11.272

From:

1.7.3

1.11.76

272 is the earliest that has Glue.

How about I let the build system run the tests and if things start breaking I 
fall back to shading Glue’s specific SDK?


FWIW, some of the other troublespots are not functional, they're log overflow

https://issues.apache.org/jira/browse/HADOOP-15040
https://issues.apache.org/jira/browse/HADOOP-14596

Myself and Cloudera collaborators are testing the shaded 1.11.271 JAR & will go 
with that into Hadoop 3.1 if we're happy, but that's not so much for new 
features but "stack traces throughout the log", which seems to be a recurrent 
issue with the JARs, and one which often slips by CI build runs. If it wasn't 
for that, we'd have stuck with 1.11.199 because it didn't have any issues that 
we hadn't already got under control 
(https://github.com/aws/aws-sdk-java/issues/1211)

Like I said: upgrades bring fear


From: Steve Loughran >
Date: Tuesday, February 13, 2018 at 3:34 PM
To: "Tayyebi, Ameen" >
Cc: Apache Spark Dev >
Subject: Re: A new external catalog





On 13 Feb 2018, at 19:50, Tayyebi, Ameen 
> wrote:


The biggest challenge is that I had to upgrade the AWS SDK to a newer version 
so that it includes the Glue client since Glue is a new service. So far, I 
haven’t see any jar hell issues, but that’s the main drawback I can see. I’ve 
made sure the version is in sync with the Kinesis client used by 
spark-streaming module.

Funnily enough, I'm currently updating the s3a troubleshooting doc, the latest 
version up front saying

"Whatever problem you have, changing the AWS SDK version will not fix things, 
only change the stack traces you see."

https://github.com/steveloughran/hadoop/blob/s3/HADOOP-15076-trouble-and-perf/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md

Upgrading AWS SDKs is, sadly, often viewed with almost the same fear as guava, 
especially if it's the unshaded version which forces in a version of jackson.

Which SDK version are you proposing? 1.11.x ?




Re: Regarding NimbusDS JOSE JWT jar 3.9 security vulnerability

2018-02-14 Thread sujith chacko
Hi Steve,

 While we are building spark 2.1 version this particular JWT jar is getting
added as part of transitive dependency of Hadoop-auth-2.7.2 project. I
discussed with one of the  Hadoop pmc, he will analyse the impact of this
particular issue in Hadoop . Once I will get more information I will update
you about this.

Thanks,
Sujith

On Wed, 14 Feb 2018 at 07 PM, Steve Loughran  wrote:

> might be coming in transitively
>
> https://issues.apache.org/jira/browse/HADOOP-14799
>
> On 13 Feb 2018, at 18:18, PJ Fanning  wrote:
>
> Hi Sujith,
> I didn't find the nimbusds dependency in any spark 2.2 jars. Maybe I missed
> something. Could you tell us which spark jar has the nimbusds dependency?
>
>
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
>
>


Re: Regarding NimbusDS JOSE JWT jar 3.9 security vulnerability

2018-02-14 Thread Steve Loughran
might be coming in transitively

https://issues.apache.org/jira/browse/HADOOP-14799

On 13 Feb 2018, at 18:18, PJ Fanning 
> wrote:

Hi Sujith,
I didn't find the nimbusds dependency in any spark 2.2 jars. Maybe I missed
something. Could you tell us which spark jar has the nimbusds dependency?





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org