Re: Chaining Spark Streaming Jobs

2017-08-23 Thread Michael Armbrust
If you use structured streaming and the file sink, you can have a subsequent stream read using the file source. This will maintain exactly once processing even if there are hiccups or failures. On Mon, Aug 21, 2017 at 2:02 PM, Sunita Arvind wrote: > Hello Spark Experts,

Re: Question on how to get appended data from structured streaming

2017-08-20 Thread Michael Armbrust
What is your end goal? Right now the foreach writer is the way to do arbitrary processing on the data produced by various output modes. On Sun, Aug 20, 2017 at 12:23 PM, Yanpeng Lin wrote: > Hello, > > I am new to Spark. > It would be appreciated if anyone could help me

Re: Restart streaming query spark 2.1 structured streaming

2017-08-15 Thread Michael Armbrust
See https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing Though I think that this currently doesn't work with the console sink. On Tue, Aug 15, 2017 at 9:40 AM, purna pradeep wrote: > Hi, > >> >>

Re: [SS] watermark, eventTime and "StreamExecution: Streaming query made progress"

2017-08-11 Thread Michael Armbrust
The point here is to tell you what watermark value was used when executing this batch. You don't know the new watermark until the batch is over and we don't want to do two passes over the data. In general the semantics of the watermark are designed to be conservative (i.e. just because data is

Re: Question about 'Structured Streaming'

2017-08-08 Thread Michael Armbrust
> > 1) Parsing data/Schema creation: The Bro IDS logs have a 8 line header > that contains the 'schema' for the data, each log http/dns/etc will have > different columns with different data types. So would I create a specific > CSV reader inherited from the general one? Also I'm assuming this

Re: Question about 'Structured Streaming'

2017-08-08 Thread Michael Armbrust
Cool stuff! A pattern I have seen is to use our CSV/TSV or JSON support to read bro logs, rather than a python library. This is likely to have much better performance since we can do all of the parsing on the JVM without having to flow it though an external python process. On Tue, Aug 8, 2017 at

Re: [SS] Console sink not supporting recovering from checkpoint location? Why?

2017-08-07 Thread Michael Armbrust
I think there is really no good reason for this limitation. On Mon, Aug 7, 2017 at 2:58 AM, Jacek Laskowski wrote: > Hi, > > While exploring checkpointing with kafka source and console sink I've > got the exception: > > // today's build from the master > scala> spark.version >

Re: Thoughts on release cadence?

2017-07-31 Thread Michael Armbrust
+1, should we update https://spark.apache.org/versioning-policy.html ? On Sun, Jul 30, 2017 at 3:34 PM, Reynold Xin wrote: > This is reasonable ... +1 > > > On Sun, Jul 30, 2017 at 2:19 AM, Sean Owen wrote: > >> The project had traditionally posted some

Re: how to convert the binary from kafak to srring pleaae

2017-07-24 Thread Michael Armbrust
There are end to end examples of using Kafka in in this blog: https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html On Sun, Jul 23, 2017 at 7:44 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > Hi all > > I want to change the binary from

Re: custom joins on dataframe

2017-07-23 Thread Michael Armbrust
> > left.join(right, my_fuzzy_udf (left("cola"),right("cola"))) > While this could work, the problem will be that we'll have to check every possible combination of tuples from left and right using your UDF. It would be best if you could somehow partition the problem so that we could reduce the

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Michael Armbrust
Here is an overview of how to work with complex JSON in Spark: https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html (works in streaming and batch) On Tue, Jul 18, 2017 at 10:29 AM, Riccardo Ferrari wrote: > What's

[ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-11 Thread Michael Armbrust
Hi all, Apache Spark 2.2.0 is the third release of the Spark 2.x line. This release removes the experimental tag from Structured Streaming. In addition, this release focuses on usability, stability, and polish, resolving over 1100 tickets. We'd like to thank our contributors and users for their

[ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-11 Thread Michael Armbrust
Hi all, Apache Spark 2.2.0 is the third release of the Spark 2.x line. This release removes the experimental tag from Structured Streaming. In addition, this release focuses on usability, stability, and polish, resolving over 1100 tickets. We'd like to thank our contributors and users for their

Re: Event time aggregation is possible in Spark Streaming ?

2017-07-10 Thread Michael Armbrust
Event-time aggregation is only supported in Structured Streaming. On Sat, Jul 8, 2017 at 4:18 AM, Swapnil Chougule wrote: > Hello, > > I want to know whether event time aggregation in spark streaming. I could > see it's possible in structured streaming. As I am working

Re: Union of 2 streaming data frames

2017-07-10 Thread Michael Armbrust
gt; going to be out soon? Do you have some sort of ETA? > > > > *From: *"Lalwani, Jayesh" <jayesh.lalw...@capitalone.com> > *Date: *Friday, July 7, 2017 at 5:46 PM > *To: *Michael Armbrust <mich...@databricks.com> > > *Cc: *"user@spark.apache.or

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-07 Thread Michael Armbrust
This vote passes! I'll followup with the release on Monday. +1: Michael Armbrust (binding) Kazuaki Ishizaki Sean Owen (binding) Joseph Bradley (binding) Ricardo Almeida Herman van Hövell tot Westerflier (binding) Yanbo Liang Nick Pentreath (binding) Wenchen Fan (binding) Sameer Agarwal Denny Lee

Re: Union of 2 streaming data frames

2017-07-07 Thread Michael Armbrust
pache.spark.sql.execution.streaming. > StreamExecution$$anonfun$org$apache$spark$sql$execution$ > streaming$StreamExecution$$runBatches$1.apply$mcZ$sp( > StreamExecution.scala:244) > > at org.apache.spark.sql.execution.streaming. > ProcessingTimeExecutor.execute(TriggerEx

[jira] [Updated] (SPARK-20441) Within the same streaming query, one StreamingRelation should only be transformed to one StreamingExecutionRelation

2017-07-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-20441: - Affects Version/s: (was: 2.2.0) > Within the same streaming query,

[jira] [Updated] (SPARK-20441) Within the same streaming query, one StreamingRelation should only be transformed to one StreamingExecutionRelation

2017-07-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-20441: - Fix Version/s: 2.2.0 > Within the same streaming query, one StreamingRelation sho

Re: Union of 2 streaming data frames

2017-07-07 Thread Michael Armbrust
df.union(df2) should be supported when both DataFrames are created from a streaming source. What error are you seeing? On Fri, Jul 7, 2017 at 11:27 AM, Lalwani, Jayesh < jayesh.lalw...@capitalone.com> wrote: > In structured streaming, Is there a way to Union 2 streaming data frames? > Are there

Re: If I pass raw SQL string to dataframe do I still get the Spark SQL optimizations?

2017-07-06 Thread Michael Armbrust
It goes through the same optimization pipeline. More in this video . On Thu, Jul 6, 2017 at 5:28 PM, kant kodali wrote: > HI All, > > I am wondering If I pass a raw SQL string to dataframe do I still get the > Spark SQL optimizations? why

[jira] [Updated] (SPARK-21267) Improvements to the Structured Streaming programming guide

2017-07-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-21267: - Target Version/s: (was: 2.2.0) > Improvements to the Structured Streaming programm

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-06-30 Thread Michael Armbrust
I'll kick off the vote with a +1. On Fri, Jun 30, 2017 at 6:44 PM, Michael Armbrust <mich...@databricks.com> wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.0. The vote is open until Friday, July 7th, 2017 at 18:00 PST and > passe

[VOTE] Apache Spark 2.2.0 (RC6)

2017-06-30 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Friday, July 7th, 2017 at 18:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ...

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-06-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070525#comment-16070525 ] Michael Armbrust commented on SPARK-18057: -- We should upgrade. Now that Kafka has a good

Re: Interesting Stateful Streaming question

2017-06-30 Thread Michael Armbrust
This does sound like a good use case for that feature. Note that Spark 2.2. adds a similar [flat]MapGroupsWithState operation to structured streaming. Stay tuned for a blog post on that! On Thu, Jun 29, 2017 at 6:11 PM, kant kodali wrote: > Is mapWithState an answer for

[jira] [Commented] (SPARK-15533) Deprecate Dataset.explode

2017-06-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070308#comment-16070308 ] Michael Armbrust commented on SPARK-15533: -- Just include the other columns too {{df.select($&qu

[jira] [Updated] (SPARK-21253) Cannot fetch big blocks to disk

2017-06-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-21253: - Target Version/s: 2.2.0 > Cannot fetch big blocks to d

[jira] [Assigned] (SPARK-21253) Cannot fetch big blocks to disk

2017-06-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-21253: Assignee: Shixiong Zhu > Cannot fetch big blocks to d

Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-26 Thread Michael Armbrust
gt; >> +1 (binding) >> >> >> On Wed, 21 Jun 2017 at 01:49 Michael Armbrust <mich...@databricks.com> >> wrote: >> >>> I will kick off the voting with a +1. >>> >>> On Tue, Jun 20, 2017 at 4:49 PM, Michael Armbrust < >>> mic

[jira] [Commented] (SPARK-21110) Structs should be usable in inequality filters

2017-06-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061398#comment-16061398 ] Michael Armbrust commented on SPARK-21110: -- It seems if you can call {{min}} and {{max

[jira] [Updated] (SPARK-21110) Structs should be usable in inequality filters

2017-06-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-21110: - Target Version/s: 2.3.0 > Structs should be usable in inequality filt

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-21 Thread Michael Armbrust
particularly notice not having to spend a solid 2-3 weeks of time QAing >>>>>>>> (unlike in earlier Spark releases). One other point not mentioned >>>>>>>> above: I >>>>>>>> think they serve as a very helpful reminder/training for the community >>>

Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-20 Thread Michael Armbrust
I will kick off the voting with a +1. On Tue, Jun 20, 2017 at 4:49 PM, Michael Armbrust <mich...@databricks.com> wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.0. The vote is open until Friday, June 23rd, 2017 at 18:00 PST and > passe

[VOTE] Apache Spark 2.2.0 (RC5)

2017-06-20 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Friday, June 23rd, 2017 at 18:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

2017-06-20 Thread Michael Armbrust
It's in the spark-catalyst_2.11-2.1.1.jar since the logical query plans and optimization also need to know about types. On Tue, Jun 20, 2017 at 1:14 PM, Jean Georges Perrin wrote: > Hey all, > > i was giving a run to 2.1.1 and got an error on one of my test program: > > package

Re: how many topics spark streaming can handle

2017-06-19 Thread Michael Armbrust
I don't think that there is really a Spark specific limit here. It would be a function of the size of your spark / kafka clusters and the type of processing you are trying to do. On Mon, Jun 19, 2017 at 12:00 PM, Ashok Kumar wrote: > Hi Gurus, > > Within one Spark

Re: cannot call explain or show on dataframe in structured streaming addBatch dataframe

2017-06-19 Thread Michael Armbrust
There is a little bit of weirdness to how we override the default query planner to replace it with an incrementalizing planner. As such, calling any operation that changes the query plan (such as a LIMIT) would cause it to revert to the batch planner and return the wrong answer. We should fix

Re: the scheme in stream reader

2017-06-19 Thread Michael Armbrust
The socket source can't know how to parse your data. I think the right thing would be for it to throw an exception saying that you can't set the schema here. Would you mind opening a JIRA ticket? If you are trying to parse data from something like JSON then you should use from_json` on the

Re: the scheme in stream reader

2017-06-19 Thread Michael Armbrust
The socket source can't know how to parse your data. I think the right thing would be for it to throw an exception saying that you can't set the schema here. Would you mind opening a JIRA ticket? If you are trying to parse data from something like JSON then you should use from_json` on the

[jira] [Updated] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-06-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-21133: - Target Version/s: 2.2.0 Priority: Blocker (was: Major) Description

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054596#comment-16054596 ] Michael Armbrust commented on SPARK-20928: -- Hi Cody, I do plan to flesh this out with the other

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Michael Armbrust
> you think ? > > Regards, > > Olivier. > > > 2017-06-15 21:08 GMT+02:00 Michael Armbrust <mich...@databricks.com>: > >> Which version of Spark? If its recent I'd open a JIRA. >> >> On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot < >>

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Michael Armbrust
> you think ? > > Regards, > > Olivier. > > > 2017-06-15 21:08 GMT+02:00 Michael Armbrust <mich...@databricks.com>: > >> Which version of Spark? If its recent I'd open a JIRA. >> >> On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot < >>

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Michael Armbrust
Which version of Spark? If its recent I'd open a JIRA. On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > when we create recursive calls to "struct" (up to 5 levels) for extending > a complex datastructure we end up with the following

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Michael Armbrust
Which version of Spark? If its recent I'd open a JIRA. On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > when we create recursive calls to "struct" (up to 5 levels) for extending > a complex datastructure we end up with the following

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-15 Thread Michael Armbrust
anks! > > On Wed, Jun 14, 2017 at 5:32 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> This a good question. I really like using Kafka as a centralized source >> for streaming data in an organization and, with Spark 2.2, we have full >> support

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-14 Thread Michael Armbrust
This a good question. I really like using Kafka as a centralized source for streaming data in an organization and, with Spark 2.2, we have full support for reading and writing data to/from Kafka in both streaming and batch

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Michael Armbrust
for >>>>>>> rigor in development. Since we instituted QA JIRAs, contributors have >>>>>>> been >>>>>>> a lot better about adding in docs early, rather than waiting until the >>>>>>> end >>>>>>

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
s, in general', > then I think they're superfluous at best. These aren't used consistently, > and their intent isn't actionable (i.e. it sounds like no particular > testing resolves the JIRA). They signal something that doesn't seem to > match the intent. > > Can we close the QA

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
2.2 needs to > block the release; Joseph what's the status on those? > > On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust <mich...@databricks.com> > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.2.0. The vote is open unti

[VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ...

[jira] [Commented] (SPARK-20980) Rename the option `wholeFile` to `multiLine` for JSON and CSV

2017-06-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037416#comment-16037416 ] Michael Armbrust commented on SPARK-20980: -- I already cut RC4, I think we may just need

Re: [VOTE] Apache Spark 2.2.0 (RC3)

2017-06-02 Thread Michael Armbrust
re, and should NEVER backport > non-bug-fix commits to an RC branch. Sorry again for the trouble! > > On Fri, Jun 2, 2017 at 2:40 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Please vote on releasing the following candidate as Apache Spark version

[jira] [Closed] (SPARK-20737) Mechanism for cleanup hooks, for structured-streaming sinks on executor shutdown.

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust closed SPARK-20737. Resolution: Won't Fix > Mechanism for cleanup hooks, for structured-streaming si

[VOTE] Apache Spark 2.2.0 (RC3)

2017-06-02 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, June 6th, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ...

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-06-02 Thread Michael Armbrust
ussion on TIMESTAMP semantics going on the thread "SQL >> TIMESTAMP semantics vs. SPARK-18350" which might impact Spark 2.2. Should >> we make a decision there before voting on the next RC for Spark 2.2? >> >> Thanks, >> Kostas >> >> On Tue, May

[jira] [Updated] (SPARK-20065) Empty output files created for aggregation query in append mode

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-20065: - Target Version/s: 2.3.0 > Empty output files created for aggregation query in app

[jira] [Updated] (SPARK-19903) Watermark metadata is lost when using resolved attributes

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19903: - Target Version/s: 2.3.0 > Watermark metadata is lost when using resolved attribu

[jira] [Updated] (SPARK-19903) Watermark metadata is lost when using resolved attributes

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19903: - Component/s: (was: PySpark) > Watermark metadata is lost when using resol

[jira] [Updated] (SPARK-19903) Watermark metadata is lost when using resolved attributes

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19903: - Summary: Watermark metadata is lost when using resolved attributes (was: PySpark Kafka

[jira] [Updated] (SPARK-19903) PySpark Kafka streaming query ouput append mode not possible

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19903: - Description: PySpark example reads a Kafka stream. There is watermarking set when

[jira] [Commented] (SPARK-20002) Add support for unions between streaming and batch datasets

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035441#comment-16035441 ] Michael Armbrust commented on SPARK-20002: -- I'm not sure that we will ever support

[jira] [Resolved] (SPARK-20147) Cloning SessionState does not clone streaming query listeners

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-20147. -- Resolution: Fixed Assignee: Kunal Khamar Fix Version/s: 2.2.0

[jira] [Updated] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-20928: - Description: Given the current Source API, the minimum possible latency for any record

[jira] [Updated] (SPARK-20734) Structured Streaming spark.sql.streaming.schemaInference not handling schema changes

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-20734: - Issue Type: New Feature (was: Bug) > Structured Stream

[jira] [Updated] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-20958: - Labels: release-notes (was: ) > Roll back parquet-mr 1.8.2 to parquet-1.

[jira] [Resolved] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-20958. -- Resolution: Won't Fix Thanks everyone. Sounds like we'll just provide directions

[jira] [Commented] (SPARK-19104) CompileException with Map and Case Class in Spark 2.1.0

2017-06-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035012#comment-16035012 ] Michael Armbrust commented on SPARK-19104: -- I'm about to cut RC3 of 2.2 and there is no pull

[jira] [Updated] (SPARK-15693) Write schema definition out for file-based data sources to avoid schema inference

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-15693: - Target Version/s: 2.3.0 (was: 2.2.0) > Write schema definition out for file-based d

[jira] [Updated] (SPARK-15380) Generate code that stores a float/double value in each column from ColumnarBatch when DataFrame.cache() is used

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-15380: - Target Version/s: 2.3.0 (was: 2.2.0) > Generate code that stores a float/double va

[jira] [Updated] (SPARK-19084) conditional function: field

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19084: - Target Version/s: 2.3.0 (was: 2.2.0) > conditional function: fi

[jira] [Updated] (SPARK-15691) Refactor and improve Hive support

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-15691: - Target Version/s: 2.3.0 (was: 2.2.0) > Refactor and improve Hive supp

[jira] [Updated] (SPARK-14878) Support Trim characters in the string trim function

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-14878: - Target Version/s: 2.3.0 (was: 2.2.0) > Support Trim characters in the string t

[jira] [Updated] (SPARK-16496) Add wholetext as option for reading text in SQL.

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-16496: - Target Version/s: 2.3.0 (was: 2.2.0) > Add wholetext as option for reading text in

[jira] [Updated] (SPARK-19241) remove hive generated table properties if they are not useful in Spark

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19241: - Target Version/s: 2.3.0 (was: 2.2.0) > remove hive generated table propert

[jira] [Updated] (SPARK-16317) Add file filtering interface for FileFormat

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-16317: - Target Version/s: 2.3.0 (was: 2.2.0) > Add file filtering interface for FileFor

[jira] [Updated] (SPARK-19027) estimate size of object buffer for object hash aggregate

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19027: - Target Version/s: 2.3.0 (was: 2.2.0) > estimate size of object buffer for object h

[jira] [Updated] (SPARK-19104) CompileException with Map and Case Class in Spark 2.1.0

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19104: - Target Version/s: 2.3.0 (was: 2.2.0) > CompileException with Map and Case Cl

[jira] [Updated] (SPARK-18245) Improving support for bucketed table

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18245: - Target Version/s: 2.3.0 (was: 2.2.0) > Improving support for bucketed ta

[jira] [Updated] (SPARK-14098) Generate Java code to build CachedColumnarBatch and get values from CachedColumnarBatch when DataFrame.cache() is called

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-14098: - Target Version/s: 2.3.0 (was: 2.2.0) > Generate Java code to build CachedColumnarBa

[jira] [Updated] (SPARK-19014) support complex aggregate buffer in HashAggregateExec

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19014: - Target Version/s: 2.3.0 (was: 2.2.0) > support complex aggregate buf

[jira] [Updated] (SPARK-16011) SQL metrics include duplicated attempts

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-16011: - Target Version/s: 2.3.0 (was: 2.2.0) > SQL metrics include duplicated attem

[jira] [Updated] (SPARK-18388) Running aggregation on many columns throws SOE

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18388: - Target Version/s: 2.3.0 (was: 2.2.0) > Running aggregation on many columns throws

[jira] [Updated] (SPARK-19989) Flaky Test: org.apache.spark.sql.kafka010.KafkaSourceStressSuite

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-19989: - Target Version/s: 2.3.0 (was: 2.2.0) > Flaky Test: org.apache.spark.sql.kafka

[jira] [Updated] (SPARK-17915) Prepare ColumnVector implementation for UnsafeData

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17915: - Target Version/s: 2.3.0 (was: 2.2.0) > Prepare ColumnVector implementat

[jira] [Updated] (SPARK-18134) SQL: MapType in Group BY and Joins not working

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18134: - Target Version/s: 2.3.0 (was: 2.2.0) > SQL: MapType in Group BY and Joins not work

[jira] [Updated] (SPARK-18455) General support for correlated subquery processing

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18455: - Target Version/s: 2.3.0 (was: 2.2.0) > General support for correlated subqu

[jira] [Updated] (SPARK-15690) Fast single-node (single-process) in-memory shuffle

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-15690: - Target Version/s: 2.3.0 (was: 2.2.0) > Fast single-node (single-process) in-mem

[jira] [Updated] (SPARK-15689) Data source API v2

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-15689: - Target Version/s: 2.3.0 (was: 2.2.0) > Data source API

[jira] [Updated] (SPARK-13184) Support minPartitions parameter for JSON and CSV datasources as options

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13184: - Target Version/s: 2.3.0 (was: 2.2.0) > Support minPartitions parameter for JSON and

[jira] [Updated] (SPARK-13682) Finalize the public API for FileFormat

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13682: - Target Version/s: 2.3.0 (was: 2.2.0) > Finalize the public API for FileFor

[jira] [Updated] (SPARK-9221) Support IntervalType in Range Frame

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-9221: Target Version/s: 2.3.0 (was: 2.2.0) > Support IntervalType in Range Fr

[jira] [Updated] (SPARK-20319) Already quoted identifiers are getting wrapped with additional quotes

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-20319: - Target Version/s: 2.3.0 (was: 2.2.0) > Already quoted identifiers are getting wrap

[jira] [Updated] (SPARK-9576) DataFrame API improvement umbrella ticket (in Spark 2.x)

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-9576: Target Version/s: 2.3.0 (was: 2.2.0) > DataFrame API improvement umbrella ticket (in Sp

[jira] [Updated] (SPARK-18394) Executing the same query twice in a row results in CodeGenerator cache misses

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18394: - Target Version/s: 2.3.0 (was: 2.2.0) > Executing the same query twice in a row resu

[jira] [Updated] (SPARK-18891) Support for specific collection types

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18891: - Target Version/s: 2.3.0 (was: 2.2.0) > Support for specific collection ty

[jira] [Updated] (SPARK-14543) SQL/Hive insertInto has unexpected results

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-14543: - Target Version/s: 2.3.0 (was: 2.2.0) > SQL/Hive insertInto has unexpected resu

[jira] [Updated] (SPARK-17556) Executor side broadcast for broadcast joins

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17556: - Target Version/s: 2.3.0 (was: 2.2.0) > Executor side broadcast for broadcast jo

[jira] [Updated] (SPARK-15694) Implement ScriptTransformation in sql/core

2017-06-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-15694: - Target Version/s: 2.3.0 (was: 2.2.0) > Implement ScriptTransformation in sql/c

<    1   2   3   4   5   6   7   8   9   10   >