[jira] [Commented] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120415#comment-16120415 ] Shixiong Zhu commented on SPARK-21682: -- I agree that driver is a bottleneck. I already saw several

[jira] [Comment Edited] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120393#comment-16120393 ] Shixiong Zhu edited comment on SPARK-21453 at 8/9/17 6:13 PM: -- The error

[jira] [Commented] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120393#comment-16120393 ] Shixiong Zhu commented on SPARK-21453: -- The error message looks like the Kafka broker storing the

[jira] [Resolved] (SPARK-21596) Audit the places calling HDFSMetadataLog.get

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21596. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.3.0

[jira] [Commented] (SPARK-21565) aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp

2017-08-07 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117173#comment-16117173 ] Shixiong Zhu commented on SPARK-21565: -- Resolved by https://github.com/apache/spark/pull/18840 >

[jira] [Assigned] (SPARK-21565) aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp

2017-08-07 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-21565: Assignee: Jose Torres > aggregate query fails with watermark on eventTime but works with

[jira] [Resolved] (SPARK-21565) aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp

2017-08-07 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21565. -- Resolution: Fixed Fix Version/s: 2.3.0 2.2.1 > aggregate query fails

[jira] [Updated] (SPARK-21374) Reading globbed paths from S3 into DF doesn't work if filesystem caching is disabled

2017-08-07 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21374: - Fix Version/s: 2.2.1 > Reading globbed paths from S3 into DF doesn't work if filesystem caching

[jira] [Commented] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-04 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114036#comment-16114036 ] Shixiong Zhu commented on SPARK-21453: -- I meant the exception in the JIRA description which looks

[jira] [Comment Edited] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-04 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114030#comment-16114030 ] Shixiong Zhu edited comment on SPARK-21453 at 8/4/17 7:02 AM: -- Could you

[jira] [Commented] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-04 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114030#comment-16114030 ] Shixiong Zhu commented on SPARK-21453: -- Could you increase "spark.kafka.producer.cache.timeout" to

[jira] [Commented] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-04 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114020#comment-16114020 ] Shixiong Zhu commented on SPARK-21453: -- [~ppanero] Could you provide all logs? I need the logs

[jira] [Updated] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-03 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21453: - Summary: Cached Kafka consumer may be closed too early (was: Streaming kafka source (structured

[jira] [Commented] (SPARK-21453) Streaming kafka source (structured spark)

2017-08-03 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113224#comment-16113224 ] Shixiong Zhu commented on SPARK-21453: -- Reopened this one. There might be some bug in caching Kafka

[jira] [Reopened] (SPARK-21453) Streaming kafka source (structured spark)

2017-08-03 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-21453: -- > Streaming kafka source (structured spark) > - > >

[jira] [Commented] (SPARK-21453) Streaming kafka source (structured spark)

2017-08-03 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113220#comment-16113220 ] Shixiong Zhu commented on SPARK-21453: -- [~ppanero] could you create a new ticket for the Kafka

[jira] [Commented] (SPARK-21453) Streaming kafka source (structured spark)

2017-08-03 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113218#comment-16113218 ] Shixiong Zhu commented on SPARK-21453: -- I'm aware of the Kafka producer issue. Right now a

[jira] [Resolved] (SPARK-21546) dropDuplicates with watermark yields RuntimeException due to binding failure

2017-08-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21546. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.3.0

[jira] [Commented] (SPARK-21590) Structured Streaming window start time should support negative values to adjust time zone

2017-08-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111467#comment-16111467 ] Shixiong Zhu commented on SPARK-21590: -- [~brkyvz] Yeah, some people may process data before 1970. I

[jira] [Commented] (SPARK-21565) aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp

2017-08-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111451#comment-16111451 ] Shixiong Zhu commented on SPARK-21565: -- Thanks for reporting it. I can reproduce the error in a unit

[jira] [Commented] (SPARK-21597) Avg event time calculated in progress may be wrong

2017-08-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111441#comment-16111441 ] Shixiong Zhu commented on SPARK-21597: -- Resolved by https://github.com/apache/spark/pull/18803 >

[jira] [Resolved] (SPARK-21597) Avg event time calculated in progress may be wrong

2017-08-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21597. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.3.0

[jira] [Updated] (SPARK-21597) Avg event time calculated in progress may be wrong

2017-08-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21597: - Priority: Minor (was: Major) > Avg event time calculated in progress may be wrong >

[jira] [Updated] (SPARK-21565) aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp

2017-08-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21565: - Description: *Short Description: * Aggregation query fails with eventTime as watermark column

[jira] [Commented] (SPARK-21590) Structured Streaming window start time should support negative values to adjust time zone

2017-08-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1610#comment-1610 ] Shixiong Zhu commented on SPARK-21590: -- Yeah, this is a bug. A timestamp can be negative. cc

[jira] [Created] (SPARK-21597) Avg event time calculated in progress may be wrong

2017-08-01 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-21597: Summary: Avg event time calculated in progress may be wrong Key: SPARK-21597 URL: https://issues.apache.org/jira/browse/SPARK-21597 Project: Spark Issue

[jira] [Created] (SPARK-21596) Audit the places calling HDFSMetadataLog.get

2017-08-01 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-21596: Summary: Audit the places calling HDFSMetadataLog.get Key: SPARK-21596 URL: https://issues.apache.org/jira/browse/SPARK-21596 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-21547) Spark cleaner cost too many time

2017-07-29 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106262#comment-16106262 ] Shixiong Zhu commented on SPARK-21547: -- Could you try 2.1.1 or 2.2.0? This may be just SPARK-18991

[jira] [Commented] (SPARK-21546) dropDuplicates with watermark yields RuntimeException due to binding failure

2017-07-28 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105426#comment-16105426 ] Shixiong Zhu commented on SPARK-21546: -- Yeah, good catch. The watermark column should be one of the

[jira] [Assigned] (SPARK-21517) Fetch local data via block manager cause oom

2017-07-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-21517: Assignee: zhoukang > Fetch local data via block manager cause oom >

[jira] [Resolved] (SPARK-21517) Fetch local data via block manager cause oom

2017-07-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21517. -- Resolution: Fixed Fix Version/s: 2.3.0 > Fetch local data via block manager cause oom >

[jira] [Commented] (SPARK-21488) Make saveAsTable() and createOrReplaceTempView() return dataframe of created table/ created view

2017-07-21 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095846#comment-16095846 ] Shixiong Zhu commented on SPARK-21488: -- Unfortunately, this will break binary compatibility. This

[jira] [Updated] (SPARK-21488) Make saveAsTable() and createOrReplaceTempView() return dataframe of created table/ created view

2017-07-21 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21488: - Component/s: (was: PySpark) (was: Spark Core) > Make saveAsTable() and

[jira] [Commented] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-20 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095744#comment-16095744 ] Shixiong Zhu commented on SPARK-21425: -- [~rdub] I just realized we never document local-cluster

[jira] [Comment Edited] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-20 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095323#comment-16095323 ] Shixiong Zhu edited comment on SPARK-21425 at 7/20/17 8:45 PM: --- [~srowen]

[jira] [Comment Edited] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-20 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095323#comment-16095323 ] Shixiong Zhu edited comment on SPARK-21425 at 7/20/17 8:38 PM: --- [~srowen]

[jira] [Commented] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-20 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095323#comment-16095323 ] Shixiong Zhu commented on SPARK-21425: -- [~srowen] The issue is static accumulators. Right? They

[jira] [Resolved] (SPARK-21463) Output of StructuredStreaming tables don't respect user specified schema when reading back the table

2017-07-20 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21463. -- Resolution: Fixed Fix Version/s: 2.3.0 > Output of StructuredStreaming tables don't

[jira] [Updated] (SPARK-21478) Unpersist a DF also unpersists related DFs

2017-07-20 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21478: - Component/s: (was: Spark Core) SQL > Unpersist a DF also unpersists related

[jira] [Resolved] (SPARK-21455) RpcFailure should be call on RpcResponseCallback.onFailure

2017-07-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21455. -- Resolution: Won't Fix > RpcFailure should be call on RpcResponseCallback.onFailure >

[jira] [Comment Edited] (SPARK-21378) Spark Poll timeout when specific offsets are passed

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092088#comment-16092088 ] Shixiong Zhu edited comment on SPARK-21378 at 7/18/17 8:01 PM: --- The data

[jira] [Commented] (SPARK-21378) Spark Poll timeout when specific offsets are passed

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092088#comment-16092088 ] Shixiong Zhu commented on SPARK-21378: -- The data must already be in Kafka when executors try to

[jira] [Comment Edited] (SPARK-21378) Spark Poll timeout when specific offsets are passed

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092088#comment-16092088 ] Shixiong Zhu edited comment on SPARK-21378 at 7/18/17 8:00 PM: --- The data

[jira] [Comment Edited] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092064#comment-16092064 ] Shixiong Zhu edited comment on SPARK-21425 at 7/18/17 7:41 PM: --- [~srowen]

[jira] [Commented] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092064#comment-16092064 ] Shixiong Zhu commented on SPARK-21425: -- [~srowen] 1. Long/DoubleAccumulator assumes that there is

[jira] [Commented] (SPARK-21460) Spark dynamic allocation breaks when ListenerBus event queue runs full

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092034#comment-16092034 ] Shixiong Zhu commented on SPARK-21460: -- Make sense. Reopened it. > Spark dynamic allocation breaks

[jira] [Reopened] (SPARK-21460) Spark dynamic allocation breaks when ListenerBus event queue runs full

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-21460: -- > Spark dynamic allocation breaks when ListenerBus event queue runs full >

[jira] [Commented] (SPARK-21460) Spark dynamic allocation breaks when ListenerBus event queue runs full

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091964#comment-16091964 ] Shixiong Zhu commented on SPARK-21460: -- [~Tagar] Right. SPARK-18838 probably will create a dedicated

[jira] [Resolved] (SPARK-21461) Spark Streaming crashes if CSV file has no read permissions

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21461. -- Resolution: Won't Fix This is a user error rather than a Spark bug. Ignoring such errors will

[jira] [Resolved] (SPARK-21460) Spark dynamic allocation breaks when ListenerBus event queue runs full

2017-07-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21460. -- Resolution: Duplicate This will be addressed in SPARK-18838 > Spark dynamic allocation breaks

[jira] [Comment Edited] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-17 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090918#comment-16090918 ] Shixiong Zhu edited comment on SPARK-21425 at 7/18/17 1:06 AM: --- I remember

[jira] [Commented] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-17 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090918#comment-16090918 ] Shixiong Zhu commented on SPARK-21425: -- I remember that if making LongAccumulator, DoubleAccumulator

[jira] [Resolved] (SPARK-21421) Add the query id as a local property to allow source and sink using it

2017-07-14 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21421. -- Resolution: Fixed Fix Version/s: 2.3.0 > Add the query id as a local property to allow

[jira] [Created] (SPARK-21421) Add the query id as a local property to allow source and sink using it

2017-07-14 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-21421: Summary: Add the query id as a local property to allow source and sink using it Key: SPARK-21421 URL: https://issues.apache.org/jira/browse/SPARK-21421 Project:

[jira] [Commented] (SPARK-20376) Make StateStoreProvider plugable

2017-07-13 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086148#comment-16086148 ] Shixiong Zhu commented on SPARK-20376: -- [~prashant_] Sorry. I forgot to resolve it. Already closed

[jira] [Resolved] (SPARK-20376) Make StateStoreProvider plugable

2017-07-13 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-20376. -- Resolution: Fixed Fix Version/s: 2.3.0 > Make StateStoreProvider plugable >

[jira] [Commented] (SPARK-21374) Reading globbed paths from S3 into DF doesn't work if filesystem caching is disabled

2017-07-12 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084764#comment-16084764 ] Shixiong Zhu commented on SPARK-21374: -- Yeah, org.apache.spark.deploy.SparkHadoopUtil.globPath uses

[jira] [Comment Edited] (SPARK-21378) Spark Poll timeout when specific offsets are passed

2017-07-12 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084772#comment-16084772 ] Shixiong Zhu edited comment on SPARK-21378 at 7/12/17 9:49 PM: --- bq. Digging

[jira] [Commented] (SPARK-21378) Spark Poll timeout when specific offsets are passed

2017-07-12 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084772#comment-16084772 ] Shixiong Zhu commented on SPARK-21378: -- bq. Digging deeper shows that there's an assert statement

[jira] [Updated] (SPARK-21378) Spark Poll timeout when specific offsets are passed

2017-07-12 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21378: - Component/s: (was: Spark Core) DStreams > Spark Poll timeout when specific

[jira] [Resolved] (SPARK-21146) Master/Worker should handle and shutdown when any thread gets UncaughtException

2017-07-12 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21146. -- Resolution: Fixed Assignee: Devaraj K Fix Version/s: 2.3.0 > Master/Worker

[jira] [Commented] (SPARK-18971) Netty issue may cause the shuffle client hang

2017-07-11 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082676#comment-16082676 ] Shixiong Zhu commented on SPARK-18971: -- [~andreu.urruela] Update Spark to 2.2.0 (not yet announce,

[jira] [Updated] (SPARK-21369) Don't use Scala classes in external shuffle service

2017-07-10 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21369: - Description: Right now the external shuffle service uses Scala Tuple2. However, the Scala

[jira] [Updated] (SPARK-21369) Don't use Scala classes in external shuffle service

2017-07-10 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21369: - Description: Right now the external shuffle service uses Scala Tuple2. However, the Scala

[jira] [Created] (SPARK-21369) Don't use Scala classes in external shuffle service

2017-07-10 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-21369: Summary: Don't use Scala classes in external shuffle service Key: SPARK-21369 URL: https://issues.apache.org/jira/browse/SPARK-21369 Project: Spark Issue

[jira] [Resolved] (SPARK-19659) Fetch big blocks to disk when shuffle-read

2017-07-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-19659. -- Resolution: Fixed Fix Version/s: 2.2.0 The major work is done in 2.2.0. But it's

[jira] [Resolved] (SPARK-21069) Add rate source to programming guide

2017-07-08 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21069. -- Resolution: Fixed Assignee: Prashant Sharma Fix Version/s: 2.3.0

[jira] [Resolved] (SPARK-21329) Make EventTimeWatermarkExec explicitly UnaryExecNode

2017-07-06 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21329. -- Resolution: Fixed Assignee: Jacek Laskowski Fix Version/s: 2.3.0 > Make

[jira] [Resolved] (SPARK-21267) Improvements to the Structured Streaming programming guide

2017-07-06 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21267. -- Resolution: Fixed Fix Version/s: 2.3.0 2.2.1 > Improvements to the

[jira] [Resolved] (SPARK-21248) Flaky test: o.a.s.sql.kafka010.KafkaSourceSuite.assign from specific offsets (failOnDataLoss: true)

2017-07-05 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21248. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.3.0 > Flaky test:

[jira] [Updated] (SPARK-19659) Fetch big blocks to disk when shuffle-read

2017-06-29 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19659: - Fix Version/s: (was: 2.2.0) > Fetch big blocks to disk when shuffle-read >

[jira] [Reopened] (SPARK-19659) Fetch big blocks to disk when shuffle-read

2017-06-29 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-19659: -- Reopened this as it's disabled. > Fetch big blocks to disk when shuffle-read >

[jira] [Resolved] (SPARK-21188) releaseAllLocksForTask should synchronize the whole method

2017-06-29 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21188. -- Resolution: Fixed Assignee: Feng Liu Fix Version/s: 2.3.0 >

[jira] [Commented] (SPARK-21253) Cannot fetch big blocks to disk

2017-06-29 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068876#comment-16068876 ] Shixiong Zhu commented on SPARK-21253: -- [~q79969786] did you run Spark 2.2.0-rcX on Yarn which has a

[jira] [Created] (SPARK-21248) Flaky test: o.a.s.sql.kafka010.KafkaSourceSuite.assign from specific offsets (failOnDataLoss: true)

2017-06-28 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-21248: Summary: Flaky test: o.a.s.sql.kafka010.KafkaSourceSuite.assign from specific offsets (failOnDataLoss: true) Key: SPARK-21248 URL:

[jira] [Resolved] (SPARK-21216) Streaming DataFrames fail to join with Hive tables

2017-06-28 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21216. -- Resolution: Fixed Fix Version/s: 2.3.0 > Streaming DataFrames fail to join with Hive

[jira] [Resolved] (SPARK-21153) Time windowing for tumbling windows can use a project instead of expand + filter

2017-06-26 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21153. -- Resolution: Fixed Fix Version/s: 2.3.0 > Time windowing for tumbling windows can use a

[jira] [Resolved] (SPARK-21192) Preserve State Store provider class configuration across StreamingQuery restarts

2017-06-23 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21192. -- Resolution: Fixed Assignee: Tathagata Das (was: Apache Spark) Fix Version/s:

[jira] [Assigned] (SPARK-20599) ConsoleSink should work with write (batch)

2017-06-22 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-20599: Assignee: Lubo Zhang > ConsoleSink should work with write (batch) >

[jira] [Updated] (SPARK-21168) KafkaRDD should always set kafka clientId.

2017-06-22 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21168: - Component/s: (was: Structured Streaming) DStreams > KafkaRDD should always

[jira] [Issue Comment Deleted] (SPARK-21167) Path is not decoded correctly when reading output of FileSink

2017-06-22 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21167: - Comment: was deleted (was: User 'dijingran' has created a pull request for this issue:

[jira] [Resolved] (SPARK-20599) ConsoleSink should work with write (batch)

2017-06-22 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-20599. -- Resolution: Fixed Fix Version/s: 2.3.0 > ConsoleSink should work with write (batch) >

[jira] [Issue Comment Deleted] (SPARK-21167) Path is not decoded correctly when reading output of FileSink

2017-06-22 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21167: - Comment: was deleted (was: User 'dijingran' has created a pull request for this issue:

[jira] [Resolved] (SPARK-21167) Path is not decoded correctly when reading output of FileSink

2017-06-22 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21167. -- Resolution: Fixed Fix Version/s: 2.3.0 2.2.1 2.1.2

[jira] [Created] (SPARK-21167) Path is not decoded correctly when reading output of FileSink

2017-06-21 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-21167: Summary: Path is not decoded correctly when reading output of FileSink Key: SPARK-21167 URL: https://issues.apache.org/jira/browse/SPARK-21167 Project: Spark

[jira] [Updated] (SPARK-21147) the schema of socket/rate source can not be set.

2017-06-21 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21147: - Affects Version/s: 2.2.0 > the schema of socket/rate source can not be set. >

[jira] [Resolved] (SPARK-21147) the schema of socket/rate source can not be set.

2017-06-21 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21147. -- Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.3.0 > the schema of

[jira] [Updated] (SPARK-21147) the schema of socket/rate source can not be set.

2017-06-21 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21147: - Summary: the schema of socket/rate source can not be set. (was: the schema of socket source can

[jira] [Updated] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-20 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21123: - Fix Version/s: 2.1.2 > Options for file stream source are in a wrong table >

[jira] [Assigned] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-20 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-21123: Assignee: Assaf Mendelson > Options for file stream source are in a wrong table >

[jira] [Commented] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054593#comment-16054593 ] Shixiong Zhu commented on SPARK-21143: -- The reason you cannot use 4.0.42.Final is because you are

[jira] [Commented] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054592#comment-16054592 ] Shixiong Zhu commented on SPARK-21143: -- As Netty is so core to Spark, it's too risky to upgrade from

[jira] [Updated] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21142: - Component/s: (was: Structured Streaming) DStreams >

[jira] [Resolved] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21123. -- Resolution: Fixed Fix Version/s: 2.3.0 2.2.0 > Options for file

[jira] [Updated] (SPARK-16430) Add an option in file stream source to read 1 file at a time

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-16430: - Fix Version/s: (was: 2.1.0) 2.0.0 > Add an option in file stream source

[jira] [Updated] (SPARK-16430) Add an option in file stream source to read 1 file at a time

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-16430: - Fix Version/s: 2.1.0 > Add an option in file stream source to read 1 file at a time >

[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict

2017-06-16 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052242#comment-16052242 ] Shixiong Zhu commented on SPARK-21065: -- Please don't use `spark.streaming.concurrentJobs` if

[jira] [Resolved] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict

2017-06-16 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21065. -- Resolution: Won't Fix > Spark Streaming concurrentJobs + StreamingJobProgressListener conflict

[jira] [Updated] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-16 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21123: - Labels: starter (was: ) > Options for file stream source are in a wrong table >

[jira] [Updated] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-16 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21123: - Affects Version/s: 2.2.0 > Options for file stream source are in a wrong table >

<    2   3   4   5   6   7   8   9   10   11   >