[jira] [Commented] (SPARK-17344) Kafka 0.8 support for Structured Streaming

2016-12-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762756#comment-15762756 ] Michael Armbrust commented on SPARK-17344: -- [KAFKA-4462] aims to give us backwards compatibility

[jira] [Updated] (SPARK-17344) Kafka 0.8 support for Structured Streaming

2016-12-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17344: - Target Version/s: 2.1.1 > Kafka 0.8 support for Structured Streaming >

[jira] [Updated] (SPARK-18908) It's hard for the user to see the failure if StreamExecution fails to create the logical plan

2016-12-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18908: - Target Version/s: 2.1.1 > It's hard for the user to see the failure if StreamExecution

[jira] [Created] (SPARK-18932) Partial aggregation for collect_set / collect_list

2016-12-19 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18932: Summary: Partial aggregation for collect_set / collect_list Key: SPARK-18932 URL: https://issues.apache.org/jira/browse/SPARK-18932 Project: Spark

[jira] [Commented] (SPARK-5632) not able to resolve dot('.') in field name

2016-12-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752929#comment-15752929 ] Michael Armbrust commented on SPARK-5632: - Hmm, I agree that error is confusing. It does work if

[jira] [Updated] (SPARK-18084) write.partitionBy() does not recognize nested columns that select() can access

2016-12-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18084: - Target Version/s: 2.2.0 > write.partitionBy() does not recognize nested columns that

[jira] [Created] (SPARK-18891) Support for specific collection types

2016-12-15 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18891: Summary: Support for specific collection types Key: SPARK-18891 URL: https://issues.apache.org/jira/browse/SPARK-18891 Project: Spark Issue Type:

[jira] [Commented] (SPARK-5632) not able to resolve dot('.') in field name

2016-12-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752771#comment-15752771 ] Michael Armbrust commented on SPARK-5632: - If you expand the commit you'll see its included in

[jira] [Resolved] (SPARK-12777) Dataset fields can't be Scala tuples

2016-12-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12777. -- Resolution: Fixed Fix Version/s: 2.1.0 This works in 2.1:

[jira] [Commented] (SPARK-17890) scala.ScalaReflectionException

2016-12-14 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749860#comment-15749860 ] Michael Armbrust commented on SPARK-17890: -- If I had to guess, I would guess that [this

[jira] [Updated] (SPARK-17689) _temporary files breaks the Spark SQL streaming job.

2016-12-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17689: - Target Version/s: 2.2.0 Description: Steps to reproduce: 1) Start a streaming

[jira] [Updated] (SPARK-18272) Test topic addition for subscribePattern on Kafka DStream and Structured Stream

2016-12-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18272: - Issue Type: Test (was: Bug) > Test topic addition for subscribePattern on Kafka DStream

[jira] [Updated] (SPARK-18790) Keep a general offset history of stream batches

2016-12-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18790: - Target Version/s: 2.1.0 > Keep a general offset history of stream batches >

[jira] [Updated] (SPARK-18796) StreamingQueryManager should not hold a lock when starting a query

2016-12-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18796: - Target Version/s: 2.1.0 > StreamingQueryManager should not hold a lock when starting a

[jira] [Created] (SPARK-18791) Stream-Stream Joins

2016-12-08 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18791: Summary: Stream-Stream Joins Key: SPARK-18791 URL: https://issues.apache.org/jira/browse/SPARK-18791 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-17890) scala.ScalaReflectionException

2016-12-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729905#comment-15729905 ] Michael Armbrust commented on SPARK-17890: -- Can you reproduce this with 2.1? If so, I think we

[jira] [Resolved] (SPARK-16902) Custom ExpressionEncoder for primitive array is not effective

2016-12-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-16902. -- Resolution: Not A Problem The encoder that is used is picked by scala's implicit

[jira] [Updated] (SPARK-18754) Rename recentProgresses to recentProgress

2016-12-06 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18754: - Target Version/s: 2.1.0 > Rename recentProgresses to recentProgress >

[jira] [Created] (SPARK-18754) Rename recentProgresses to recentProgress

2016-12-06 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18754: Summary: Rename recentProgresses to recentProgress Key: SPARK-18754 URL: https://issues.apache.org/jira/browse/SPARK-18754 Project: Spark Issue

[jira] [Closed] (SPARK-18749) CLONE - checkpointLocation being set in memory streams fail after restart. Should fail fast

2016-12-06 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust closed SPARK-18749. Resolution: Invalid > CLONE - checkpointLocation being set in memory streams fail after

[jira] [Created] (SPARK-18749) CLONE - checkpointLocation being set in memory streams fail after restart. Should fail fast

2016-12-06 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18749: Summary: CLONE - checkpointLocation being set in memory streams fail after restart. Should fail fast Key: SPARK-18749 URL:

[jira] [Closed] (SPARK-17921) checkpointLocation being set in memory streams fail after restart. Should fail fast

2016-12-06 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust closed SPARK-17921. Resolution: Won't Fix > checkpointLocation being set in memory streams fail after restart.

[jira] [Updated] (SPARK-18234) Update mode in structured streaming

2016-12-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18234: - Target Version/s: 2.2.0 > Update mode in structured streaming >

[jira] [Created] (SPARK-18682) Batch Source for Kafka

2016-12-01 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18682: Summary: Batch Source for Kafka Key: SPARK-18682 URL: https://issues.apache.org/jira/browse/SPARK-18682 Project: Spark Issue Type: New Feature

[jira] [Reopened] (SPARK-18122) Fallback to Kryo for unknown classes in ExpressionEncoder

2016-11-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reopened SPARK-18122: -- I'm going to reopen this. I think the benefits outweigh the compatibility concerns. >

[jira] [Updated] (SPARK-17939) Spark-SQL Nullability: Optimizations vs. Enforcement Clarification

2016-11-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17939: - Target Version/s: 2.1.0 > Spark-SQL Nullability: Optimizations vs. Enforcement

[jira] [Updated] (SPARK-17939) Spark-SQL Nullability: Optimizations vs. Enforcement Clarification

2016-11-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17939: - Target Version/s: 2.2.0 (was: 2.1.0) > Spark-SQL Nullability: Optimizations vs.

[jira] [Created] (SPARK-18657) Persist UUID across query restart

2016-11-30 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18657: Summary: Persist UUID across query restart Key: SPARK-18657 URL: https://issues.apache.org/jira/browse/SPARK-18657 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-18588) KafkaSourceStressForDontFailOnDataLossSuite is flaky

2016-11-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18588: - Target Version/s: 2.1.0 > KafkaSourceStressForDontFailOnDataLossSuite is flaky >

[jira] [Resolved] (SPARK-16545) Structured Streaming : foreachSink creates the Physical Plan multiple times per TriggerInterval

2016-11-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-16545. -- Resolution: Later > Structured Streaming : foreachSink creates the Physical Plan

[jira] [Updated] (SPARK-18655) Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18655: - Fix Version/s: (was: 2.1.0) > Ignore Structured Streaming 2.0.2 logs in history

[jira] [Updated] (SPARK-18655) Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18655: - Target Version/s: 2.1.0 > Ignore Structured Streaming 2.0.2 logs in history server >

[jira] [Resolved] (SPARK-18516) Separate instantaneous state from progress performance statistics

2016-11-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-18516. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15954

[jira] [Reopened] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reopened SPARK-18475: -- > Be able to provide higher parallelization for StructuredStreaming Kafka Source >

[jira] [Commented] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706692#comment-15706692 ] Michael Armbrust commented on SPARK-18475: -- I think that this suggestion was closed prematurely.

[jira] [Resolved] (SPARK-18498) Clean up HDFSMetadataLog API for better testing

2016-11-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-18498. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15924

[jira] [Commented] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2016-11-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15703146#comment-15703146 ] Michael Armbrust commented on SPARK-18541: -- I don't think you can use {{as}} in python, as I

[jira] [Updated] (SPARK-18498) Clean up HDFSMetadataLog API for better testing

2016-11-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18498: - Target Version/s: 2.1.0 (was: 2.2.0) > Clean up HDFSMetadataLog API for better testing

[jira] [Created] (SPARK-18530) Kafka timestamp should be TimestampType

2016-11-21 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18530: Summary: Kafka timestamp should be TimestampType Key: SPARK-18530 URL: https://issues.apache.org/jira/browse/SPARK-18530 Project: Spark Issue Type:

[jira] [Updated] (SPARK-18339) Don't push down current_timestamp for filters in StructuredStreaming

2016-11-21 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18339: - Priority: Critical (was: Major) > Don't push down current_timestamp for filters in

[jira] [Updated] (SPARK-18513) Record and recover watermark

2016-11-21 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18513: - Priority: Blocker (was: Major) > Record and recover watermark >

[jira] [Updated] (SPARK-18513) Record and recover watermark

2016-11-21 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18513: - Target Version/s: 2.1.0 > Record and recover watermark > >

[jira] [Created] (SPARK-18529) Timeouts shouldn't be AssertionErrors

2016-11-21 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18529: Summary: Timeouts shouldn't be AssertionErrors Key: SPARK-18529 URL: https://issues.apache.org/jira/browse/SPARK-18529 Project: Spark Issue Type:

[jira] [Updated] (SPARK-18516) Separate instantaneous state from progress performance statistics

2016-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18516: - Description: There are two types of information that you want to be able to extract from

[jira] [Updated] (SPARK-18516) Separate instantaneous state from progress performance statistics

2016-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18516: - Description: There are two types of information that you want to be able to extract from

[jira] [Created] (SPARK-18516) Separate instantaneous state from progress performance statistics

2016-11-20 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18516: Summary: Separate instantaneous state from progress performance statistics Key: SPARK-18516 URL: https://issues.apache.org/jira/browse/SPARK-18516 Project:

[jira] [Updated] (SPARK-18339) Don't push down current_timestamp for filters in StructuredStreaming

2016-11-18 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18339: - Assignee: Tyson Condie > Don't push down current_timestamp for filters in

[jira] [Updated] (SPARK-18339) Don't push down current_timestamp for filters in StructuredStreaming

2016-11-18 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18339: - Target Version/s: 2.1.0 (was: 2.2.0) > Don't push down current_timestamp for filters in

[jira] [Updated] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18497: - Target Version/s: 2.1.0 > ForeachSink fails with "assertion failed: No plan for

[jira] [Updated] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18497: - Priority: Critical (was: Major) > ForeachSink fails with "assertion failed: No plan for

[jira] [Resolved] (SPARK-18461) Improve docs on StreamingQueryListener and StreamingQuery.status

2016-11-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-18461. -- Resolution: Fixed Issue resolved by pull request 15897

[jira] [Commented] (SPARK-17977) DataFrameReader and DataStreamReader should have an ancestor class

2016-11-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671287#comment-15671287 ] Michael Armbrust commented on SPARK-17977: -- No, they were actually the same class for a while.

[jira] [Resolved] (SPARK-18440) Fix FileStreamSink with aggregation + watermark + append mode

2016-11-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-18440. -- Resolution: Fixed Issue resolved by pull request 15885

[jira] [Updated] (SPARK-18407) Inferred partition columns cause assertion error

2016-11-10 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18407: - Description: [This

[jira] [Created] (SPARK-18407) Inferred partition columns cause assertion error

2016-11-10 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18407: Summary: Inferred partition columns cause assertion error Key: SPARK-18407 URL: https://issues.apache.org/jira/browse/SPARK-18407 Project: Spark

[jira] [Commented] (SPARK-17691) Add aggregate function to collect list with maximum number of elements

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652723#comment-15652723 ] Michael Armbrust commented on SPARK-17691: -- I think that should be able to use mutable buffers

[jira] [Updated] (SPARK-18388) Running aggregation on many columns throws SOE

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18388: - Component/s: (was: Spark Core) SQL > Running aggregation on many

[jira] [Assigned] (SPARK-18211) Spark SQL ignores split.size

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-18211: Assignee: Michael Armbrust > Spark SQL ignores split.size >

[jira] [Resolved] (SPARK-18211) Spark SQL ignores split.size

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-18211. -- Resolution: Not A Problem As of Spark 2.0 we do our own splitting/bin-packing of files

[jira] [Updated] (SPARK-10816) EventTime based sessionization

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10816: - Target Version/s: (was: 2.2.0) > EventTime based sessionization >

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17937: - Target Version/s: (was: 2.1.0) > Clarify Kafka offset semantics for Structured

[jira] [Resolved] (SPARK-17879) Don't compact metadata logs constantly into a single compacted file

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-17879. -- Resolution: Not A Problem > Don't compact metadata logs constantly into a single

[jira] [Comment Edited] (SPARK-18227) Parquet file stream sink create a hidden directory "_spark_metadata" cause the DataFrame read from directory failed

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651778#comment-15651778 ] Michael Armbrust edited comment on SPARK-18227 at 11/9/16 7:12 PM: --- The

[jira] [Commented] (SPARK-18227) Parquet file stream sink create a hidden directory "_spark_metadata" cause the DataFrame read from directory failed

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651778#comment-15651778 ] Michael Armbrust commented on SPARK-18227: -- The {{_spark_metadata}} directory holds the

[jira] [Resolved] (SPARK-18227) Parquet file stream sink create a hidden directory "_spark_metadata" cause the DataFrame read from directory failed

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-18227. -- Resolution: Not A Problem > Parquet file stream sink create a hidden directory

[jira] [Resolved] (SPARK-15406) Structured streaming support for consuming from Kafka

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-15406. -- Resolution: Done Fix Version/s: 2.1.0 > Structured streaming support for

[jira] [Updated] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18371: - Component/s: (was: Structured Streaming) DStreams > Spark Streaming

[jira] [Updated] (SPARK-17815) Report committed offsets

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17815: - Issue Type: New Feature (was: Sub-task) Parent: (was: SPARK-15406) > Report

[jira] [Updated] (SPARK-17344) Kafka 0.8 support for Structured Streaming

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17344: - Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-15406) > Kafka

[jira] [Updated] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.1.0

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18057: - Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-15406) > Update

[jira] [Updated] (SPARK-18373) Make KafkaSource's failOnDataLoss=false work with Spark jobs

2016-11-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18373: - Target Version/s: 2.1.0 > Make KafkaSource's failOnDataLoss=false work with Spark jobs >

[jira] [Updated] (SPARK-18339) Don't push down current_timestamp for filters in StructuredStreaming

2016-11-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18339: - Labels: (was: correctness) > Don't push down current_timestamp for filters in

[jira] [Commented] (SPARK-17691) Add aggregate function to collect list with maximum number of elements

2016-11-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1564#comment-1564 ] Michael Armbrust commented on SPARK-17691: -- +1 > Add aggregate function to collect list with

[jira] [Updated] (SPARK-18339) Don't push down current_timestamp for filters in StructuredStreaming

2016-11-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18339: - Target Version/s: 2.2.0 > Don't push down current_timestamp for filters in

[jira] [Resolved] (SPARK-18295) Match up to_json to from_json in null safety

2016-11-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-18295. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15792

[jira] [Updated] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-11-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-15044: - Target Version/s: 2.2.0 > spark-sql will throw "input path does not exist" exception if

[jira] [Reopened] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-11-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reopened SPARK-15044: -- I've actually heard this complaint from several different large hive users. Some times

[jira] [Commented] (SPARK-18277) na.fill() and friends should work on struct fields

2016-11-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637622#comment-15637622 ] Michael Armbrust commented on SPARK-18277: -- We've been talking about better support for nested

[jira] [Commented] (SPARK-18258) Sinks need access to offset representation

2016-11-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637595#comment-15637595 ] Michael Armbrust commented on SPARK-18258: -- What sort of failures are you anticipating here? >

[jira] [Commented] (SPARK-18258) Sinks need access to offset representation

2016-11-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637491#comment-15637491 ] Michael Armbrust commented on SPARK-18258: -- I agree that we don't want to lock people in, which

[jira] [Updated] (SPARK-18258) Sinks need access to offset representation

2016-11-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18258: - Target Version/s: 2.2.0 > Sinks need access to offset representation >

[jira] [Updated] (SPARK-18260) from_json can throw a better exception when it can't find the column or be nullSafe

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18260: - Target Version/s: 2.1.0 Priority: Blocker (was: Major) > from_json can

[jira] [Commented] (SPARK-18260) from_json can throw a better exception when it can't find the column or be nullSafe

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634766#comment-15634766 ] Michael Armbrust commented on SPARK-18260: -- We should return null if the input is null. >

[jira] [Updated] (SPARK-18212) Flaky test: org.apache.spark.sql.kafka010.KafkaSourceSuite.assign from specific offsets

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18212: - Assignee: Cody Koeninger > Flaky test:

[jira] [Resolved] (SPARK-18212) Flaky test: org.apache.spark.sql.kafka010.KafkaSourceSuite.assign from specific offsets

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-18212. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15737

[jira] [Updated] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18254: - Target Version/s: 2.1.0 > UDFs don't see aliased column names >

[jira] [Commented] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634122#comment-15634122 ] Michael Armbrust commented on SPARK-18254: -- Is this yet another bug caused by the generic

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17937: - Priority: Critical (was: Major) > Clarify Kafka offset semantics for Structured

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17937: - Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-15406) >

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17937: - Target Version/s: 2.1.0 > Clarify Kafka offset semantics for Structured Streaming >

[jira] [Commented] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-11-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634061#comment-15634061 ] Michael Armbrust commented on SPARK-17937: -- I'm going to pull this out from the parent JIRA as I

[jira] [Created] (SPARK-18234) Update mode

2016-11-02 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-18234: Summary: Update mode Key: SPARK-18234 URL: https://issues.apache.org/jira/browse/SPARK-18234 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-18212) Flaky test: org.apache.spark.sql.kafka010.KafkaSourceSuite.assign from specific offsets

2016-11-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628170#comment-15628170 ] Michael Armbrust commented on SPARK-18212: -- +1 to upping the timeout. We run with

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-11-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17937: - Component/s: Structured Streaming > Clarify Kafka offset semantics for Structured

[jira] [Updated] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.1.0

2016-11-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18057: - Component/s: Structured Streaming > Update structured streaming kafka from 10.0.1 to

[jira] [Updated] (SPARK-17343) Prerequisites for Kafka 0.8 support in Structured Streaming

2016-11-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17343: - Component/s: (was: DStreams) Structured Streaming > Prerequisites

[jira] [Updated] (SPARK-17837) Disaster recovery of offsets from WAL

2016-11-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17837: - Component/s: Structured Streaming > Disaster recovery of offsets from WAL >

[jira] [Updated] (SPARK-17834) Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-11-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17834: - Component/s: (was: SQL) Structured Streaming > Fetch the earliest

[jira] [Updated] (SPARK-17346) Kafka 0.10 support in Structured Streaming

2016-11-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17346: - Component/s: (was: DStreams) Structured Streaming > Kafka 0.10

[jira] [Updated] (SPARK-17345) Prerequisites for Kafka 0.10 support in Structured Streaming

2016-11-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17345: - Component/s: (was: DStreams) Structured Streaming > Prerequisites

<    1   2   3   4   5   6   7   8   9   10   >