Re: Hoping contribute code-Spark 2.1.1 Documentation

2017-05-02 Thread Reynold Xin
Liucht,

Thanks for the interest. You are more than welcomed to contribute a pull
request to fix the issue, at https://github.com/apache/spark



On Tue, May 2, 2017 at 7:44 PM, cht liu  wrote:

> Hello,The Spark organizational leader :
> This is my first time to contribute the Spark code, do not know much
> about the process.I released a bug in JIRA:SPARK-20570
> :On the
> spark.apache.org home page, when I click the menu Latest Release (Spark
> 2.1.1) under the documentation menu ,the next page latest appear with
> display 2.1.0 lable in the upper left corner of the page.
> I want to contribute the Spark code perfect modify it,The hope can
> give a chance.
> Thanks very much!
>
> liucht
>


Hoping contribute code-Spark 2.1.1 Documentation

2017-05-02 Thread cht liu
Hello,The Spark organizational leader :
This is my first time to contribute the Spark code, do not know much
about the process.I released a bug in JIRA:SPARK-20570
:On the spark.apache.org
home page, when I click the menu Latest Release (Spark 2.1.1) under the
documentation menu ,the next page latest appear with display 2.1.0 lable in
the upper left corner of the page.
I want to contribute the Spark code perfect modify it,The hope can give
a chance.
Thanks very much!

liucht


[ANNOUNCE] Apache Spark 2.1.1

2017-05-02 Thread Michael Armbrust
We are happy to announce the availability of Spark 2.1.1!

Apache Spark 2.1.1 is a maintenance release, based on the branch-2.1
maintenance branch of Spark. We strongly recommend all 2.1.x users to
upgrade to this stable release.

To download Apache Spark 2.1.1 visit http://spark.apache.org/downloads.html

We would like to acknowledge all community members for contributing patches
to this release.


[Spark Streaming] Dynamic Broadcast Variable Update

2017-05-02 Thread Nipun Arora
Hi All,

To support our Spark Streaming based anomaly detection tool, we have made a
patch in Spark 1.6.2 to dynamically update broadcast variables.

I'll first explain our use-case, which I believe should be common to
several people using Spark Streaming applications. Broadcast variables are
often used to store values "machine learning models", which can then be
used on streaming data to "test" and get the desired results (for our case
anomalies). Unfortunately, in the current spark, broadcast variables are
final and can only be initialized once before the initialization of the
streaming context. Hence, if a new model is learned the streaming system
cannot be updated without shutting down the application, broadcasting
again, and restarting the application. Our goal was to re-broadcast
variables without requiring a downtime of the streaming service.

The key to this implementation is a live re-broadcastVariable() interface,
which can be triggered in between micro-batch executions, without any
re-boot required for the streaming application. At a high level the task is
done by re-fetching broadcast variable information from the spark driver,
and then re-distribute it to the workers. The micro-batch execution is
blocked while the update is made, by taking a lock on the execution. We
have already tested this in our prototype deployment of our anomaly
detection service and can successfully re-broadcast the broadcast variables
with no downtime.

We would like to integrate these changes in spark, can anyone please let me
know the process of submitting patches/ new features to spark. Also. I
understand that the current version of Spark is 2.1. However, our changes
have been done and tested on Spark 1.6.2, will this be a problem?

Thanks
Nipun


Re: Spark 2.2.0 or Spark 2.3.0?

2017-05-02 Thread Michael Armbrust
An RC for 2.2.0
 was
released last week.  Please test.

Note that update mode has been supported since 2.0.

On Mon, May 1, 2017 at 10:43 PM, kant kodali  wrote:

> Hi All,
>
> If I understand the Spark standard release process correctly. It looks
> like the official release is going to be sometime end of this month and it
> is going to be 2.2.0 right (not 2.3.0)? I am eagerly looking for Spark
> 2.2.0 because of the "update mode" option in Spark Streaming. Please
> correct me if I am wrong.
>
> Thanks!
>


Re: Spark 2.2.0 or Spark 2.3.0?

2017-05-02 Thread Felix Cheung
Yes 2.2.0


From: kant kodali 
Sent: Monday, May 1, 2017 10:43:44 PM
To: dev
Subject: Spark 2.2.0 or Spark 2.3.0?

Hi All,

If I understand the Spark standard release process correctly. It looks like the 
official release is going to be sometime end of this month and it is going to 
be 2.2.0 right (not 2.3.0)? I am eagerly looking for Spark 2.2.0 because of the 
"update mode" option in Spark Streaming. Please correct me if I am wrong.

Thanks!


Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-05-02 Thread Nick Pentreath
I won't +1 just given that it seems certain there will be another RC and
there are the outstanding ML QA blocker issues.

But clean build and test for JVM and Python tests LGTM on CentOS Linux
7.2.1511, OpenJDK 1.8.0_111

On Mon, 1 May 2017 at 22:42 Frank Austin Nothaft 
wrote:

> Hi Ryan,
>
> IMO, the problem is that the Spark Avro version conflicts with the Parquet
> Avro version. As discussed upthread, I don’t think there’s a way to
> *reliably *make sure that Avro 1.8 is on the classpath first while using
> spark-submit. Relocating avro in our project wouldn’t solve the problem,
> because the MethodNotFoundError is thrown from the internals of the
> ParquetAvroOutputFormat, not from code in our project.
>
> Regards,
>
> Frank Austin Nothaft
> fnoth...@berkeley.edu
> fnoth...@eecs.berkeley.edu
> 202-340-0466 <(202)%20340-0466>
>
> On May 1, 2017, at 12:33 PM, Ryan Blue  wrote:
>
> Michael, I think that the problem is with your classpath.
>
> Spark has a dependency to 1.7.7, which can't be changed. Your project is
> what pulls in parquet-avro and transitively Avro 1.8. Spark has no runtime
> dependency on Avro 1.8. It is understandably annoying that using the same
> version of Parquet for your parquet-avro dependency is what causes your
> project to depend on Avro 1.8, but Spark's dependencies aren't a problem
> because its Parquet dependency doesn't bring in Avro.
>
> There are a few ways around this:
> 1. Make sure Avro 1.8 is found in the classpath first
> 2. Shade Avro 1.8 in your project (assuming Avro classes aren't shared)
> 3. Use parquet-avro 1.8.1 in your project, which I think should work with
> 1.8.2 and avoid the Avro change
>
> The work-around in Spark is for tests, which do use parquet-avro. We can
> look at a Parquet 1.8.3 that avoids this issue, but I think this is
> reasonable for the 2.2.0 release.
>
> rb
>
> On Mon, May 1, 2017 at 12:08 PM, Michael Heuer  wrote:
>
>> Please excuse me if I'm misunderstanding -- the problem is not with our
>> library or our classpath.
>>
>> There is a conflict within Spark itself, in that Parquet 1.8.2 expects to
>> find Avro 1.8.0 on the runtime classpath and sees 1.7.7 instead.  Spark
>> already has to work around this for unit tests to pass.
>>
>>
>>
>> On Mon, May 1, 2017 at 2:00 PM, Ryan Blue  wrote:
>>
>>> Thanks for the extra context, Frank. I agree that it sounds like your
>>> problem comes from the conflict between your Jars and what comes with
>>> Spark. Its the same concern that makes everyone shudder when anything has a
>>> public dependency on Jackson. :)
>>>
>>> What we usually do to get around situations like this is to relocate the
>>> problem library inside the shaded Jar. That way, Spark uses its version of
>>> Avro and your classes use a different version of Avro. This works if you
>>> don't need to share classes between the two. Would that work for your
>>> situation?
>>>
>>> rb
>>>
>>> On Mon, May 1, 2017 at 11:55 AM, Koert Kuipers 
>>> wrote:
>>>
 sounds like you are running into the fact that you cannot really put
 your classes before spark's on classpath? spark's switches to support this
 never really worked for me either.

 inability to control the classpath + inconsistent jars => trouble ?

 On Mon, May 1, 2017 at 2:36 PM, Frank Austin Nothaft <
 fnoth...@berkeley.edu> wrote:

> Hi Ryan,
>
> We do set Avro to 1.8 in our downstream project. We also set Spark as
> a provided dependency, and build an überjar. We run via spark-submit, 
> which
> builds the classpath with our überjar and all of the Spark deps. This 
> leads
> to avro 1.7.1 getting picked off of the classpath at runtime, which causes
> the no such method exception to occur.
>
> Regards,
>
> Frank Austin Nothaft
> fnoth...@berkeley.edu
> fnoth...@eecs.berkeley.edu
> 202-340-0466 <(202)%20340-0466>
>
> On May 1, 2017, at 11:31 AM, Ryan Blue  wrote:
>
> Frank,
>
> The issue you're running into is caused by using parquet-avro with
> Avro 1.7. Can't your downstream project set the Avro dependency to 1.8?
> Spark can't update Avro because it is a breaking change that would force
> users to rebuilt specific Avro classes in some cases. But you should be
> free to use Avro 1.8 to avoid the problem.
>
> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <
> fnoth...@berkeley.edu> wrote:
>
>> Hi Ryan et al,
>>
>> The issue we’ve seen using a build of the Spark 2.2.0 branch from a
>> downstream project is that parquet-avro uses one of the new Avro 1.8.0
>> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a
>> dependency. My colleague Michael (who posted earlier on this thread)
>> documented this in Spark-19697
>>