Re: Welcoming Tejas Patil as a Spark committer

2017-09-29 Thread Takeshi Yamamuro
Congrats!

best,
takeshi

On Sat, Sep 30, 2017 at 8:47 AM, vaquar khan  wrote:

> Congrats Tejas
>
> Regards,
> Vaquar khan
>
> On Fri, Sep 29, 2017 at 4:33 PM, Mridul Muralidharan 
> wrote:
>
>> Congratulations Tejas !
>>
>> Regards,
>> Mridul
>>
>> On Fri, Sep 29, 2017 at 12:58 PM, Matei Zaharia 
>> wrote:
>> > Hi all,
>> >
>> > The Spark PMC recently added Tejas Patil as a committer on the
>> > project. Tejas has been contributing across several areas of Spark for
>> > a while, focusing especially on scalability issues and SQL. Please
>> > join me in welcoming Tejas!
>> >
>> > Matei
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783 <(224)%20436-0783>
> Greater Chicago
>



-- 
---
Takeshi Yamamuro


Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-29 Thread vaquar khan
+1 (non-binding)

Regards,
Vaquar khan

On Fri, Sep 29, 2017 at 1:52 PM, Ryan Blue 
wrote:

> +1 (non-binding)
>
> Checked all signatures/checksums for binaries and source, spot-checked
> maven artifacts. Thanks for fixing the signatures, Holden!
>
> On Fri, Sep 29, 2017 at 8:25 AM, Holden Karau 
> wrote:
>
>> As a follow up the JIRA for this is at https://issues.apache.org/j
>> ira/browse/SPARK-22167
>>
>> On Fri, Sep 29, 2017 at 2:50 AM, Holden Karau 
>> wrote:
>>
>>> This vote is canceled and will be replaced with an RC3 once Felix and I
>>> figure out the R packaging issue.
>>>
>>> On Fri, Sep 29, 2017 at 1:03 AM Felix Cheung 
>>> wrote:
>>>
 -1

 (Sorry) spark-2.1.2-bin-hadoop2.7.tgz is missing the R directory, not
 sure why yet.

 Tested on multiple platform as source package, (against 2.1.1 jar)
 seemed fine except this WARNING on R-devel

 * checking for code/documentation mismatches ... WARNING
 Codoc mismatches from documentation object 'attach':
 attach
   Code: function(what, pos = 2L, name = deparse(substitute(what),
  backtick = FALSE), warn.conflicts = TRUE)
   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
  warn.conflicts = TRUE)
   Mismatches in argument default values:
 Name: 'name' Code: deparse(substitute(what), backtick = FALSE)
 Docs: deparse(substitute(what))

 Checked the latest release R 3.4.1 and the signature change wasn't
 there. This likely indicated an upcoming change in the next R release that
 could insur this new warning when we attempt to publish the package.

 Not sure what we can do now since we work with multiple versions of R
 and they will have different signatures then.
 --
 *From:* Luciano Resende 
 *Sent:* Thursday, September 28, 2017 10:29:18 PM
 *To:* Holden Karau
 *Cc:* dev@spark.apache.org

 *Subject:* Re: [VOTE] Spark 2.1.2 (RC2)
 +1 (non-binding)

 Minor comments:
 The apache infra has a staging repository to add release candidates,
 and it might be better/simpler to use that instead of home.a.o. See
 https://dist.apache.org/repos/dist/dev/spark/.



 On Tue, Sep 26, 2017 at 9:47 PM, Holden Karau 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.1.2. The vote is open until Wednesday October 4th at 23:59
> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.2
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v2.1.2-rc2
>  (fabbb7f59e47590
> 114366d14e15fbbff8c88593c)
>
> List of JIRA tickets resolved in this release can be found with this
> filter.
> 
>
> The release files, including signatures, digests, etc. can be found at:
> https://home.apache.org/~holden/spark-2.1.2-rc2-bin/
>
> Release artifacts are signed with a key from:
> https://people.apache.org/~holden/holdens_keys.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1251
>
> The documentation corresponding to this release can be found at:
> https://people.apache.org/~holden/spark-2.1.2-rc2-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the
> Java/Scala you can add the staging repository to your projects resolvers
> and test with the RC (make sure to clean up the artifact cache
> before/after so you don't end up building with a out of date RC going
> forward).
>
> *What should happen to JIRA tickets still targeting 2.1.2?*
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should be
> worked on immediately. Everything else please retarget to 2.1.3.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from 2.1.1. That
> being said if there is something 

Re: Welcoming Tejas Patil as a Spark committer

2017-09-29 Thread vaquar khan
Congrats Tejas

Regards,
Vaquar khan

On Fri, Sep 29, 2017 at 4:33 PM, Mridul Muralidharan 
wrote:

> Congratulations Tejas !
>
> Regards,
> Mridul
>
> On Fri, Sep 29, 2017 at 12:58 PM, Matei Zaharia 
> wrote:
> > Hi all,
> >
> > The Spark PMC recently added Tejas Patil as a committer on the
> > project. Tejas has been contributing across several areas of Spark for
> > a while, focusing especially on scalability issues and SQL. Please
> > join me in welcoming Tejas!
> >
> > Matei
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Regards,
Vaquar Khan
+1 -224-436-0783
Greater Chicago


Re: Welcoming Tejas Patil as a Spark committer

2017-09-29 Thread Mridul Muralidharan
Congratulations Tejas !

Regards,
Mridul

On Fri, Sep 29, 2017 at 12:58 PM, Matei Zaharia  wrote:
> Hi all,
>
> The Spark PMC recently added Tejas Patil as a committer on the
> project. Tejas has been contributing across several areas of Spark for
> a while, focusing especially on scalability issues and SQL. Please
> join me in welcoming Tejas!
>
> Matei
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Welcoming Tejas Patil as a Spark committer

2017-09-29 Thread shane knapp
congrats, and welcome!  :)

On Fri, Sep 29, 2017 at 12:58 PM, Matei Zaharia  wrote:
> Hi all,
>
> The Spark PMC recently added Tejas Patil as a committer on the
> project. Tejas has been contributing across several areas of Spark for
> a while, focusing especially on scalability issues and SQL. Please
> join me in welcoming Tejas!
>
> Matei
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Welcoming Tejas Patil as a Spark committer

2017-09-29 Thread Matei Zaharia
Hi all,

The Spark PMC recently added Tejas Patil as a committer on the
project. Tejas has been contributing across several areas of Spark for
a while, focusing especially on scalability issues and SQL. Please
join me in welcoming Tejas!

Matei

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] Data Source V2 write path

2017-09-29 Thread Ryan Blue
> Spark doesn't know how to create a table in external systems like
Cassandra, and that's why it's currently done inside the data source writer.

This isn't a valid argument for doing this task in the writer for v2. If we
want to fix the problems with v1, we shouldn't continue to mix write
operations with table metadata changes simply because it is more convenient
and requires less refactoring.

I'm proposing that in v2 we move creation of file system tables outside of
the writer, but still in a non-public implementation. Cassandra and other
external stores would behave as they should today and assume the table
exists, or would wait to use the v2 API until there is catalog support.

The important thing is that we don't set a standard that writers can create
tables, which is going to lead to different behavior across implementations
when we have conflicts between an existing table's config and the options
passed into the writer.

> For now, Spark just assumes data source writer takes care of it. For the
internal file format data source, I propose to pass partition/bucket
information to the writer via options, other data sources can define their
own behavior, e.g. they can also use the options, or disallow users to
write data to a non-existing table and ask users to create the table in the
external systems first.

The point is preventing data sources from defining their own behavior so we
can introduce consistent behavior across sources for v2.

rb

On Thu, Sep 28, 2017 at 8:49 PM, Wenchen Fan  wrote:

> > When this CTAS logical node is turned into a physical plan, the relation
> gets turned into a `DataSourceV2` instance and then Spark gets a writer and
> configures it with the proposed API. The main point of this is to pass the
> logical relation (with all of the user's options) through to the data
> source, not the writer. The data source creates the writer and can tell the
> writer what to do.
>
> Here is the problem: Spark doesn't know how to create a table in external
> systems like Cassandra, and that's why it's currently done inside the data
> source writer.
>
> In the future, we can add a new trait `CatalogSupport` for `DataSourceV2`,
> so that we can use your proposal and separate metadata management from data
> source writer.
>
> For now, Spark just assumes data source writer takes care of it. For the
> internal file format data source, I propose to pass partition/bucket
> information to the writer via options, other data sources can define their
> own behavior, e.g. they can also use the options, or disallow users to
> write data to a non-existing table and ask users to create the table in the
> external systems first.
>
>
>
> On Thu, Sep 28, 2017 at 5:45 AM, Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> On an unrelated note, is there any appetite for making the write path
>> also include an option to return elements that were not
>> able to be processed for some reason.
>>
>> Usage might be like
>>
>> saveAndIgnoreFailures() : Dataset
>>
>> So that if some records cannot be parsed by the datasource for writing,
>> or violate some contract with the datasource the records can be returned
>> for further processing or dealt with by an alternate system.
>>
>> On Wed, Sep 27, 2017 at 12:40 PM Ryan Blue 
>> wrote:
>>
>>> Comments inline. I've written up what I'm proposing with a bit more
>>> detail.
>>>
>>> On Tue, Sep 26, 2017 at 11:17 AM, Wenchen Fan 
>>> wrote:
>>>
 I'm trying to give a summary:

 Ideally data source API should only deal with data, not metadata. But
 one key problem is, Spark still need to support data sources without
 metastore, e.g. file format data sources.

 For this kind of data sources, users have to pass the metadata
 information like partitioning/bucketing to every write action of a
 "table"(or other identifiers like path of a file format data source), and
 it's user's responsibility to make sure these metadata information are
 consistent. If it's inconsistent, the behavior is undefined, different data
 sources may have different behaviors.

>>>
>>> Agreed so far. One minor point is that we currently throws an exception
>>> if you try to configure, for example, partitioning and also use
>>> `insertInto`.
>>>
>>>
 If we agree on this, then data source write API should have a way to
 pass these metadata information, and I think using data source options is
 a good choice because it's the most implicit way and doesn't require new
 APIs.

>>>
>>> What I don't understand is why we "can't avoid this problem" unless you
>>> mean the last point, that we have to support this. I don't think that using
>>> data source options is a good choice, but maybe I don't understand the
>>> alternatives. Here's a straw-man version of what I'm proposing so you can
>>> tell me what's wrong with it or why options are a better choice.
>>>
>>> I'm 

Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-29 Thread Ryan Blue
+1 (non-binding)

Checked all signatures/checksums for binaries and source, spot-checked
maven artifacts. Thanks for fixing the signatures, Holden!

On Fri, Sep 29, 2017 at 8:25 AM, Holden Karau  wrote:

> As a follow up the JIRA for this is at https://issues.apache.org/
> jira/browse/SPARK-22167
>
> On Fri, Sep 29, 2017 at 2:50 AM, Holden Karau 
> wrote:
>
>> This vote is canceled and will be replaced with an RC3 once Felix and I
>> figure out the R packaging issue.
>>
>> On Fri, Sep 29, 2017 at 1:03 AM Felix Cheung 
>> wrote:
>>
>>> -1
>>>
>>> (Sorry) spark-2.1.2-bin-hadoop2.7.tgz is missing the R directory, not
>>> sure why yet.
>>>
>>> Tested on multiple platform as source package, (against 2.1.1 jar)
>>> seemed fine except this WARNING on R-devel
>>>
>>> * checking for code/documentation mismatches ... WARNING
>>> Codoc mismatches from documentation object 'attach':
>>> attach
>>>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>>>  backtick = FALSE), warn.conflicts = TRUE)
>>>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>>>  warn.conflicts = TRUE)
>>>   Mismatches in argument default values:
>>> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
>>> deparse(substitute(what))
>>>
>>> Checked the latest release R 3.4.1 and the signature change wasn't
>>> there. This likely indicated an upcoming change in the next R release that
>>> could insur this new warning when we attempt to publish the package.
>>>
>>> Not sure what we can do now since we work with multiple versions of R
>>> and they will have different signatures then.
>>> --
>>> *From:* Luciano Resende 
>>> *Sent:* Thursday, September 28, 2017 10:29:18 PM
>>> *To:* Holden Karau
>>> *Cc:* dev@spark.apache.org
>>>
>>> *Subject:* Re: [VOTE] Spark 2.1.2 (RC2)
>>> +1 (non-binding)
>>>
>>> Minor comments:
>>> The apache infra has a staging repository to add release candidates, and
>>> it might be better/simpler to use that instead of home.a.o. See
>>> https://dist.apache.org/repos/dist/dev/spark/.
>>>
>>>
>>>
>>> On Tue, Sep 26, 2017 at 9:47 PM, Holden Karau 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.1.2. The vote is open until Wednesday October 4th at 23:59
 PST and passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.1.2
 [ ] -1 Do not release this package because ...


 To learn more about Apache Spark, please see https://spark.apache.org/

 The tag to be voted on is v2.1.2-rc2
  (fabbb7f59e47590
 114366d14e15fbbff8c88593c)

 List of JIRA tickets resolved in this release can be found with this
 filter.
 

 The release files, including signatures, digests, etc. can be found at:
 https://home.apache.org/~holden/spark-2.1.2-rc2-bin/

 Release artifacts are signed with a key from:
 https://people.apache.org/~holden/holdens_keys.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1251

 The documentation corresponding to this release can be found at:
 https://people.apache.org/~holden/spark-2.1.2-rc2-docs/


 *FAQ*

 *How can I help test this release?*

 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test with
 the RC (make sure to clean up the artifact cache before/after so you
 don't end up building with a out of date RC going forward).

 *What should happen to JIRA tickets still targeting 2.1.2?*

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should be
 worked on immediately. Everything else please retarget to 2.1.3.

 *But my bug isn't fixed!??!*

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from 2.1.1. That
 being said if there is something which is a regression form 2.1.1 that
 has not been correctly targeted please ping a committer to help target the
 issue (you can see the open issues listed as impacting Spark 2.1.1 & 2
 .1.2
 

Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-29 Thread Holden Karau
As a follow up the JIRA for this is at
https://issues.apache.org/jira/browse/SPARK-22167

On Fri, Sep 29, 2017 at 2:50 AM, Holden Karau  wrote:

> This vote is canceled and will be replaced with an RC3 once Felix and I
> figure out the R packaging issue.
>
> On Fri, Sep 29, 2017 at 1:03 AM Felix Cheung 
> wrote:
>
>> -1
>>
>> (Sorry) spark-2.1.2-bin-hadoop2.7.tgz is missing the R directory, not
>> sure why yet.
>>
>> Tested on multiple platform as source package, (against 2.1.1 jar) seemed
>> fine except this WARNING on R-devel
>>
>> * checking for code/documentation mismatches ... WARNING
>> Codoc mismatches from documentation object 'attach':
>> attach
>>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>>  backtick = FALSE), warn.conflicts = TRUE)
>>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>>  warn.conflicts = TRUE)
>>   Mismatches in argument default values:
>> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
>> deparse(substitute(what))
>>
>> Checked the latest release R 3.4.1 and the signature change wasn't there.
>> This likely indicated an upcoming change in the next R release that could
>> insur this new warning when we attempt to publish the package.
>>
>> Not sure what we can do now since we work with multiple versions of R and
>> they will have different signatures then.
>> --
>> *From:* Luciano Resende 
>> *Sent:* Thursday, September 28, 2017 10:29:18 PM
>> *To:* Holden Karau
>> *Cc:* dev@spark.apache.org
>>
>> *Subject:* Re: [VOTE] Spark 2.1.2 (RC2)
>> +1 (non-binding)
>>
>> Minor comments:
>> The apache infra has a staging repository to add release candidates, and
>> it might be better/simpler to use that instead of home.a.o. See
>> https://dist.apache.org/repos/dist/dev/spark/.
>>
>>
>>
>> On Tue, Sep 26, 2017 at 9:47 PM, Holden Karau 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.1.2. The vote is open until Wednesday October 4th at 23:59
>>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.1.2
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v2.1.2-rc2
>>>  (fabbb7f59e47590
>>> 114366d14e15fbbff8c88593c)
>>>
>>> List of JIRA tickets resolved in this release can be found with this
>>> filter.
>>> 
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://home.apache.org/~holden/spark-2.1.2-rc2-bin/
>>>
>>> Release artifacts are signed with a key from:
>>> https://people.apache.org/~holden/holdens_keys.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1251
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://people.apache.org/~holden/spark-2.1.2-rc2-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test with
>>> the RC (make sure to clean up the artifact cache before/after so you
>>> don't end up building with a out of date RC going forward).
>>>
>>> *What should happen to JIRA tickets still targeting 2.1.2?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.1.3.
>>>
>>> *But my bug isn't fixed!??!*
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.1.1. That being said
>>> if there is something which is a regression form 2.1.1 that has not
>>> been correctly targeted please ping a committer to help target the issue
>>> (you can see the open issues listed as impacting Spark 2.1.1 & 2.1.2
>>> 
>>> )
>>>
>>> *What are the unresolved* issues targeted for 2.1.2
>>> 

Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-29 Thread Holden Karau
This vote is canceled and will be replaced with an RC3 once Felix and I
figure out the R packaging issue.

On Fri, Sep 29, 2017 at 1:03 AM Felix Cheung 
wrote:

> -1
>
> (Sorry) spark-2.1.2-bin-hadoop2.7.tgz is missing the R directory, not sure
> why yet.
>
> Tested on multiple platform as source package, (against 2.1.1 jar) seemed
> fine except this WARNING on R-devel
>
> * checking for code/documentation mismatches ... WARNING
> Codoc mismatches from documentation object 'attach':
> attach
>   Code: function(what, pos = 2L, name = deparse(substitute(what),
>  backtick = FALSE), warn.conflicts = TRUE)
>   Docs: function(what, pos = 2L, name = deparse(substitute(what)),
>  warn.conflicts = TRUE)
>   Mismatches in argument default values:
> Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs:
> deparse(substitute(what))
>
> Checked the latest release R 3.4.1 and the signature change wasn't there.
> This likely indicated an upcoming change in the next R release that could
> insur this new warning when we attempt to publish the package.
>
> Not sure what we can do now since we work with multiple versions of R and
> they will have different signatures then.
> --
> *From:* Luciano Resende 
> *Sent:* Thursday, September 28, 2017 10:29:18 PM
> *To:* Holden Karau
> *Cc:* dev@spark.apache.org
>
> *Subject:* Re: [VOTE] Spark 2.1.2 (RC2)
> +1 (non-binding)
>
> Minor comments:
> The apache infra has a staging repository to add release candidates, and
> it might be better/simpler to use that instead of home.a.o. See
> https://dist.apache.org/repos/dist/dev/spark/.
>
>
>
> On Tue, Sep 26, 2017 at 9:47 PM, Holden Karau 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.2. The vote is open until Wednesday October 4th at 23:59 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.2
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.1.2-rc2
>>  (
>> fabbb7f59e47590114366d14e15fbbff8c88593c)
>>
>> List of JIRA tickets resolved in this release can be found with this
>> filter.
>> 
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://home.apache.org/~holden/spark-2.1.2-rc2-bin/
>>
>> Release artifacts are signed with a key from:
>> https://people.apache.org/~holden/holdens_keys.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1251
>>
>> The documentation corresponding to this release can be found at:
>> https://people.apache.org/~holden/spark-2.1.2-rc2-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you
>> can add the staging repository to your projects resolvers and test with the
>> RC (make sure to clean up the artifact cache before/after so you don't
>> end up building with a out of date RC going forward).
>>
>> *What should happen to JIRA tickets still targeting 2.1.2?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.3.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1. That being said
>> if there is something which is a regression form 2.1.1 that has not been
>> correctly targeted please ping a committer to help target the issue (you
>> can see the open issues listed as impacting Spark 2.1.1 & 2.1.2
>> 
>> )
>>
>> *What are the unresolved* issues targeted for 2.1.2
>> 
>> ?
>>
>> At this time there are no open unresolved issues.
>>
>> *Is there anything different about this release?*
>>
>> This is the first release in awhile not built on the AMPLAB Jenkins. This
>> is good because it means future 

Structured Streaming and Hive

2017-09-29 Thread HanPan
Hi guys,

 

 I'm new to spark structured streaming. I'm using 2.1.0 and my scenario
is reading specific topic from kafka and do some data mining tasks, then
save the result dataset to hive.

 While writing data to hive, somehow it seems like not supported yet and
I tried this:



   It runs ok, but no result in hive.

 

   Any idea writing the stream result to hive?

 

Thanks

Pan

 

 



Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-29 Thread Felix Cheung
-1

(Sorry) spark-2.1.2-bin-hadoop2.7.tgz is missing the R directory, not sure why 
yet.

Tested on multiple platform as source package, (against 2.1.1 jar) seemed fine 
except this WARNING on R-devel

* checking for code/documentation mismatches ... WARNING
Codoc mismatches from documentation object 'attach':
attach
  Code: function(what, pos = 2L, name = deparse(substitute(what),
 backtick = FALSE), warn.conflicts = TRUE)
  Docs: function(what, pos = 2L, name = deparse(substitute(what)),
 warn.conflicts = TRUE)
  Mismatches in argument default values:
Name: 'name' Code: deparse(substitute(what), backtick = FALSE) Docs: 
deparse(substitute(what))

Checked the latest release R 3.4.1 and the signature change wasn't there. This 
likely indicated an upcoming change in the next R release that could insur this 
new warning when we attempt to publish the package.

Not sure what we can do now since we work with multiple versions of R and they 
will have different signatures then.

From: Luciano Resende 
Sent: Thursday, September 28, 2017 10:29:18 PM
To: Holden Karau
Cc: dev@spark.apache.org
Subject: Re: [VOTE] Spark 2.1.2 (RC2)

+1 (non-binding)

Minor comments:
The apache infra has a staging repository to add release candidates, and it 
might be better/simpler to use that instead of home.a.o. See 
https://dist.apache.org/repos/dist/dev/spark/.



On Tue, Sep 26, 2017 at 9:47 PM, Holden Karau 
> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.1.2. 
The vote is open until Wednesday October 4th at 23:59 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.2
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is 
v2.1.2-rc2 
(fabbb7f59e47590114366d14e15fbbff8c88593c)

List of JIRA tickets resolved in this release can be found with this 
filter.

The release files, including signatures, digests, etc. can be found at:
https://home.apache.org/~holden/spark-2.1.2-rc2-bin/

Release artifacts are signed with a key from:
https://people.apache.org/~holden/holdens_keys.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1251

The documentation corresponding to this release can be found at:
https://people.apache.org/~holden/spark-2.1.2-rc2-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

If you're working in PySpark you can set up a virtual env and install the 
current RC and see if anything important breaks, in the Java/Scala you can add 
the staging repository to your projects resolvers and test with the RC (make 
sure to clean up the artifact cache before/after so you don't end up building 
with a out of date RC going forward).

What should happen to JIRA tickets still targeting 2.1.2?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.1.3.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.1. That being said if there is 
something which is a regression form 2.1.1 that has not been correctly targeted 
please ping a committer to help target the issue (you can see the open issues 
listed as impacting Spark 2.1.1 & 
2.1.2)

What are the unresolved issues targeted for 
2.1.2?

At this time there are no open unresolved issues.

Is there anything different about this release?

This is the first release in awhile not built on the AMPLAB Jenkins. This is 
good because it means future releases can more easily be built and signed 
securely (and I've been updating the documentation in 
https://github.com/apache/spark-website/pull/66 as I progress), however the 
chances of a mistake are higher with any change like this. If there something 
you normally take for granted as correct when checking a release, please double 
check this time :)

Should I be committing code to branch-2.1?

Thanks for asking!