[jira] [Created] (SPARK-44517) first operator should respect the nullability of child expression as well as ignoreNulls option

2023-07-23 Thread Nan Zhu (Jira)
Nan Zhu created SPARK-44517: --- Summary: first operator should respect the nullability of child expression as well as ignoreNulls option Key: SPARK-44517 URL: https://issues.apache.org/jira/browse/SPARK-44517

[jira] [Commented] (SEDONA-211) Enforce release managers to use JDK 8

2022-12-22 Thread Nan Zhu (Jira)
[ https://issues.apache.org/jira/browse/SEDONA-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651443#comment-17651443 ] Nan Zhu commented on SEDONA-211: [~jiayu] Nan from SafeGraph here, we were hit by this in Sedona 1.2 as

[jira] [Created] (SPARK-33940) allow configuring the max column name length in csv writer

2020-12-29 Thread Nan Zhu (Jira)
Nan Zhu created SPARK-33940: --- Summary: allow configuring the max column name length in csv writer Key: SPARK-33940 URL: https://issues.apache.org/jira/browse/SPARK-33940 Project: Spark Issue Type:

[jira] [Commented] (SPARK-32351) Partially pushed partition filters are not explained

2020-10-19 Thread Nan Zhu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217238#comment-17217238 ] Nan Zhu commented on SPARK-32351: - [~hyukjin.kwon] nit: could you reassign this to me? [~codingcat] ;)

[jira] [Resolved] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu resolved SPARK-26862. - Resolution: Invalid > assertion failed in ParquetRowConverter > ---

[jira] [Commented] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766355#comment-16766355 ] Nan Zhu commented on SPARK-26862: - [~srowen] I don't think so, as the same parquet files can be accessed

[jira] [Updated] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-26862: Description: When I run the following  query over a internal table (A and B are typed in string, C is

[jira] [Commented] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766327#comment-16766327 ] Nan Zhu commented on SPARK-26862: - [~felixcheung] > assertion failed in ParquetRowConverter >

[jira] [Created] (SPARK-26862) assertion failed in ParquetRowConverter

2019-02-12 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-26862: --- Summary: assertion failed in ParquetRowConverter Key: SPARK-26862 URL: https://issues.apache.org/jira/browse/SPARK-26862 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-24797) Analyzer should respect spark.sql.hive.convertMetastoreOrc/Parquet when build the data source table

2018-07-13 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu resolved SPARK-24797. - Resolution: Won't Fix > Analyzer should respect spark.sql.hive.convertMetastoreOrc/Parquet when build

[jira] [Created] (SPARK-24797) Analyzer should respect spark.sql.hive.convertMetastoreOrc/Parquet when build the data source table

2018-07-12 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-24797: --- Summary: Analyzer should respect spark.sql.hive.convertMetastoreOrc/Parquet when build the data source table Key: SPARK-24797 URL: https://issues.apache.org/jira/browse/SPARK-24797

[jira] [Updated] (MXNET-62) improve the quality of Spark integration

2018-03-08 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/MXNET-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated MXNET-62: - Component/s: Scala API > improve the quality of Spark integration > > >

[jira] [Updated] (MXNET-62) improve the quality of Spark integration

2018-03-08 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/MXNET-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated MXNET-62: - Labels: spark (was: ) > improve the quality of Spark integration > > >

[jira] [Assigned] (MXNET-62) improve the quality of Spark integration

2018-03-08 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/MXNET-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu reassigned MXNET-62: Assignee: Nan Zhu > improve the quality of Spark integration > > >

[jira] [Created] (MXNET-62) improve the quality of Spark integration

2018-03-08 Thread Nan Zhu (JIRA)
Nan Zhu created MXNET-62: Summary: improve the quality of Spark integration Key: MXNET-62 URL: https://issues.apache.org/jira/browse/MXNET-62 Project: Apache MXNet Issue Type: Improvement

[jira] [Commented] (SPARK-22599) Avoid extra reading for cached table

2017-12-24 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303022#comment-16303022 ] Nan Zhu commented on SPARK-22599: - [~rajesh.balamohan] no, it means that SPARK-22599 and master are

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297453#comment-16297453 ] Nan Zhu commented on SPARK-22765: - I took a look at the code, one of the possibilities is as following:

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-18 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295958#comment-16295958 ] Nan Zhu commented on SPARK-22765: - [~xuefuz] Regarding this, "The symptom is that newly allocated

[jira] [Commented] (SPARK-21656) spark dynamic allocation should not idle timeout executors when there are enough tasks to run on them

2017-12-18 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295282#comment-16295282 ] Nan Zhu commented on SPARK-21656: - NOTE: the issue fixed by https://github.com/apache/spark/pull/18874 >

[jira] [Created] (SPARK-22790) add a configurable factor to describe HadoopFsRelation's size

2017-12-14 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-22790: --- Summary: add a configurable factor to describe HadoopFsRelation's size Key: SPARK-22790 URL: https://issues.apache.org/jira/browse/SPARK-22790 Project: Spark Issue

[jira] [Commented] (SPARK-22790) add a configurable factor to describe HadoopFsRelation's size

2017-12-14 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291985#comment-16291985 ] Nan Zhu commented on SPARK-22790: - created per discussion in https://github.com/apache/spark/pull/19864

[jira] [Commented] (SPARK-22680) SparkSQL scan all partitions when the specified partitions are not exists in parquet formatted table

2017-12-07 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282248#comment-16282248 ] Nan Zhu commented on SPARK-22680: - how you observed that spark scans all partitions? I tried to reproduce

[jira] [Created] (SPARK-22673) InMemoryRelation should utilize on-disk table stats whenever possible

2017-12-01 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-22673: --- Summary: InMemoryRelation should utilize on-disk table stats whenever possible Key: SPARK-22673 URL: https://issues.apache.org/jira/browse/SPARK-22673 Project: Spark

[jira] [Updated] (SPARK-22599) Avoid extra reading for cached table

2017-11-29 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-22599: Description: In the current implementation of Spark, InMemoryTableExec read all data in a cached table,

[jira] [Updated] (SPARK-22599) Avoid extra reading for cached table

2017-11-23 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-22599: Description: In the current implementation of Spark, InMemoryTableExec read all data in a cached table,

[jira] [Created] (SPARK-22599) Avoid extra reading for cached table

2017-11-23 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-22599: --- Summary: Avoid extra reading for cached table Key: SPARK-22599 URL: https://issues.apache.org/jira/browse/SPARK-22599 Project: Spark Issue Type: Improvement

[jira] [Created] (LIVY-410) support rate throttling in livy

2017-10-05 Thread Nan Zhu (JIRA)
Nan Zhu created LIVY-410: Summary: support rate throttling in livy Key: LIVY-410 URL: https://issues.apache.org/jira/browse/LIVY-410 Project: Livy Issue Type: Improvement Reporter: Nan

[jira] [Closed] (SPARK-21197) Tricky use case makes dead application struggle for a long duration

2017-06-24 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu closed SPARK-21197. --- Resolution: Won't Fix > Tricky use case makes dead application struggle for a long duration >

[jira] [Commented] (SPARK-21197) Tricky use case makes dead application struggle for a long duration

2017-06-24 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062162#comment-16062162 ] Nan Zhu commented on SPARK-21197: - yeah, after rethinking about the solution, I think daemon thread would

[jira] [Updated] (SPARK-21197) Tricky use case makes dead application struggle for a long duration

2017-06-23 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-21197: Summary: Tricky use case makes dead application struggle for a long duration (was: Tricky use cases makes

[jira] [Created] (SPARK-21197) Tricky use cases makes dead application struggle for a long duration

2017-06-23 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-21197: --- Summary: Tricky use cases makes dead application struggle for a long duration Key: SPARK-21197 URL: https://issues.apache.org/jira/browse/SPARK-21197 Project: Spark

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-03 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036157#comment-16036157 ] Nan Zhu commented on SPARK-20928: - if I understand correctly the tasks will be "long-term" tasks just

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-05-30 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030379#comment-16030379 ] Nan Zhu commented on SPARK-20928: - Hi, is there any description on what does it mean? > Continuous

[jira] [Commented] (SPARK-4921) TaskSetManager mistakenly returns PROCESS_LOCAL for NO_PREF tasks

2017-05-23 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021417#comment-16021417 ] Nan Zhu commented on SPARK-4921: I forgot most of details...but the final conclusion was that "it's a typo

[jira] [Commented] (SPARK-20811) GBT Classifier failed with mysterious StackOverflowError

2017-05-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018246#comment-16018246 ] Nan Zhu commented on SPARK-20811: - thanks, let me try it > GBT Classifier failed with mysterious

[jira] [Created] (SPARK-20811) GBT Classifier failed with mysterious StackOverflowException

2017-05-19 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-20811: --- Summary: GBT Classifier failed with mysterious StackOverflowException Key: SPARK-20811 URL: https://issues.apache.org/jira/browse/SPARK-20811 Project: Spark Issue

[jira] [Updated] (SPARK-20811) GBT Classifier failed with mysterious StackOverflowError

2017-05-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-20811: Summary: GBT Classifier failed with mysterious StackOverflowError (was: GBT Classifier failed with

[jira] [Commented] (SPARK-20251) Spark streaming skips batches in a case of failure

2017-04-20 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977589#comment-15977589 ] Nan Zhu commented on SPARK-20251: - ignore my previous comments...the moving on Spark Streaming is due to

[jira] [Comment Edited] (SPARK-20251) Spark streaming skips batches in a case of failure

2017-04-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962318#comment-15962318 ] Nan Zhu edited comment on SPARK-20251 at 4/10/17 12:16 AM: --- more details here,

[jira] [Commented] (SPARK-20251) Spark streaming skips batches in a case of failure

2017-04-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962318#comment-15962318 ] Nan Zhu commented on SPARK-20251: - more details here, by "be proceeding", I mean it is expected that the

[jira] [Comment Edited] (SPARK-20251) Spark streaming skips batches in a case of failure

2017-04-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962313#comment-15962313 ] Nan Zhu edited comment on SPARK-20251 at 4/9/17 11:57 PM: -- why this is an

[jira] [Commented] (SPARK-20251) Spark streaming skips batches in a case of failure

2017-04-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962313#comment-15962313 ] Nan Zhu commented on SPARK-20251: - why this is an invalid report? I have been observing the same behavior

[jira] [Commented] (SPARK-19789) Add the shortcut of .format("parquet").option("path", "/hdfs/path").partitionBy("col1", "col2").start()

2017-03-12 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906782#comment-15906782 ] Nan Zhu commented on SPARK-19789: - [~zsxwing] mind reviewing the PR? > Add the shortcut of

[jira] [Created] (SPARK-19789) Add the shortcut of .format("parquet").option("path", "/hdfs/path").partitionBy("col1", "col2").start()

2017-03-01 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-19789: --- Summary: Add the shortcut of .format("parquet").option("path", "/hdfs/path").partitionBy("col1", "col2").start() Key: SPARK-19789 URL: https://issues.apache.org/jira/browse/SPARK-19789

[jira] [Updated] (SPARK-19788) DataStreamReader/DataStreamWriter.option shall accept user-defined type

2017-03-01 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-19788: Description: There are many other data sources/sinks which has very different configuration ways than

[jira] [Updated] (SPARK-19788) DataStreamReader/DataStreamWriter.option shall accept user-defined type

2017-03-01 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-19788: Description: There are many other data sources/sinks which has very different configuration ways than

[jira] [Comment Edited] (SPARK-19788) DataStreamReader/DataStreamWriter.option shall accept user-defined type

2017-03-01 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890522#comment-15890522 ] Nan Zhu edited comment on SPARK-19788 at 3/1/17 4:45 PM: - another drawback is

[jira] [Updated] (SPARK-19788) DataStreamReader/DataStreamWriter.option shall accept user-defined type

2017-03-01 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-19788: Summary: DataStreamReader/DataStreamWriter.option shall accept user-defined type (was:

[jira] [Commented] (SPARK-19788) DataStreamReader.option shall accept user-defined type

2017-03-01 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890522#comment-15890522 ] Nan Zhu commented on SPARK-19788: - another drawback is that it might look like incompatible with

[jira] [Created] (SPARK-19788) DataStreamReader.option shall accept user-defined type

2017-03-01 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-19788: --- Summary: DataStreamReader.option shall accept user-defined type Key: SPARK-19788 URL: https://issues.apache.org/jira/browse/SPARK-19788 Project: Spark Issue Type:

[jira] [Commented] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-02-27 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886999#comment-15886999 ] Nan Zhu commented on SPARK-19280: - [~zsxwing] please let me know if we agree on that 2 is something we

[jira] [Updated] (SPARK-19499) Add more notes in the comments of Sink.addBatch()

2017-02-07 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-19499: Description: addBatch method in Sink trait is supposed to be a synchronous method to coordinate with the

[jira] [Updated] (SPARK-19499) Add more notes in the comments of Sink.addBatch()

2017-02-07 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-19499: Summary: Add more notes in the comments of Sink.addBatch() (was: Add more description in the comments of

[jira] [Created] (SPARK-19499) Add more description in the comments of Sink.addBatch()

2017-02-07 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-19499: --- Summary: Add more description in the comments of Sink.addBatch() Key: SPARK-19499 URL: https://issues.apache.org/jira/browse/SPARK-19499 Project: Spark Issue Type:

[jira] [Commented] (SPARK-19233) Inconsistent Behaviour of Spark Streaming Checkpoint

2017-02-03 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851787#comment-15851787 ] Nan Zhu commented on SPARK-19233: - ping > Inconsistent Behaviour of Spark Streaming Checkpoint >

[jira] [Commented] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-02-03 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851786#comment-15851786 ] Nan Zhu commented on SPARK-19280: - ping > Failed Recovery from checkpoint caused by the multi-threads

[jira] (SPARK-19233) Inconsistent Behaviour of Spark Streaming Checkpoint

2017-01-29 Thread Nan Zhu (JIRA)
Title: Message Title Nan Zhu commented on SPARK-19233

[jira] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-29 Thread Nan Zhu (JIRA)
Title: Message Title Nan Zhu commented on SPARK-19280

[jira] [Created] (SPARK-19358) LiveListenerBus shall log the event name when dropping them due to a fully filled queue

2017-01-24 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-19358: --- Summary: LiveListenerBus shall log the event name when dropping them due to a fully filled queue Key: SPARK-19358 URL: https://issues.apache.org/jira/browse/SPARK-19358

[jira] [Comment Edited] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-20 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831209#comment-15831209 ] Nan Zhu edited comment on SPARK-19280 at 1/20/17 1:24 PM: -- [~zsxwing] Thanks for

[jira] [Commented] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831217#comment-15831217 ] Nan Zhu commented on SPARK-19280: - BTW, do I need to highlight the KafkaDStream issue as another JIRA,

[jira] [Commented] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831209#comment-15831209 ] Nan Zhu commented on SPARK-19280: - [~zsxwing] Thanks for reply 0) I do not think the content in

[jira] [Updated] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-19280: Description: In one of our applications, we found the following issue, the application recovering from a

[jira] [Commented] (SPARK-19233) Inconsistent Behaviour of Spark Streaming Checkpoint

2017-01-19 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831098#comment-15831098 ] Nan Zhu commented on SPARK-19233: - By filtering generatedRDDs, I may bring some confusion here, what I

[jira] [Updated] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-18 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-19280: Description: In one of our applications, we found the following issue, the application recovering from a

[jira] [Commented] (SPARK-19278) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-18 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828621#comment-15828621 ] Nan Zhu commented on SPARK-19278: - any one would help to close this one? as it is a duplication of

[jira] [Commented] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-18 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828602#comment-15828602 ] Nan Zhu commented on SPARK-19280: - [~zsxwing] would you mind confirming about this? it would be great if

[jira] [Created] (SPARK-19280) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-18 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-19280: --- Summary: Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler Key: SPARK-19280 URL: https://issues.apache.org/jira/browse/SPARK-19280

[jira] [Created] (SPARK-19278) Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler

2017-01-18 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-19278: --- Summary: Failed Recovery from checkpoint caused by the multi-threads issue in Spark Streaming scheduler Key: SPARK-19278 URL: https://issues.apache.org/jira/browse/SPARK-19278

[jira] [Commented] (SPARK-19233) Inconsistent Behaviour of Spark Streaming Checkpoint

2017-01-15 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823364#comment-15823364 ] Nan Zhu commented on SPARK-19233: - [~zsxwing] so, another potential issue I found in Spark Streaming

[jira] [Commented] (SPARK-19233) Inconsistent Behaviour of Spark Streaming Checkpoint

2017-01-15 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823359#comment-15823359 ] Nan Zhu commented on SPARK-19233: - The category of this issue is Improvement which is subject to be

[jira] [Created] (SPARK-19233) Inconsistent Behaviour of Spark Streaming Checkpoint

2017-01-15 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-19233: --- Summary: Inconsistent Behaviour of Spark Streaming Checkpoint Key: SPARK-19233 URL: https://issues.apache.org/jira/browse/SPARK-19233 Project: Spark Issue Type:

[jira] [Commented] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2017-01-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813632#comment-15813632 ] Nan Zhu commented on SPARK-18905: - [~zsxwing] If you agree on the conclusion above, I will file a PR >

[jira] [Commented] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2017-01-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813560#comment-15813560 ] Nan Zhu commented on SPARK-18905: - eat my words... when we have queued up batches, we do need

[jira] [Comment Edited] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2017-01-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813459#comment-15813459 ] Nan Zhu edited comment on SPARK-18905 at 1/10/17 1:16 AM: -- yeah, but the

[jira] [Commented] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2017-01-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813459#comment-15813459 ] Nan Zhu commented on SPARK-18905: - yeah, but the downTime including all batches from "checkpoint time" to

[jira] [Comment Edited] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2017-01-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813434#comment-15813434 ] Nan Zhu edited comment on SPARK-18905 at 1/10/17 1:05 AM: -- Hi, [~zsxwing]

[jira] [Commented] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2017-01-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813434#comment-15813434 ] Nan Zhu commented on SPARK-18905: - Hi, [~zsxwing] Thanks for the reply, After testing in our

[jira] [Updated] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2016-12-16 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-18905: Description: the current implementation of Spark streaming considers a batch is completed no matter the

[jira] [Updated] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2016-12-16 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-18905: Description: the current implementation of Spark streaming considers a batch is completed no matter the

[jira] [Updated] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2016-12-16 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-18905: Description: the current implementation of Spark streaming considers a batch is completed no matter the

[jira] [Created] (SPARK-18905) Potential Issue of Semantics of BatchCompleted

2016-12-16 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-18905: --- Summary: Potential Issue of Semantics of BatchCompleted Key: SPARK-18905 URL: https://issues.apache.org/jira/browse/SPARK-18905 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-17347) Encoder in Dataset example has incorrect type

2016-08-31 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-17347: Summary: Encoder in Dataset example has incorrect type (was: Encoder in Dataset example is incorrect on

[jira] [Created] (SPARK-17347) Encoder in Dataset example is incorrect on type

2016-08-31 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-17347: --- Summary: Encoder in Dataset example is incorrect on type Key: SPARK-17347 URL: https://issues.apache.org/jira/browse/SPARK-17347 Project: Spark Issue Type: Bug

[jira] [Closed] (SPARK-14247) Spark does not compile with CDH-5.4.x due to the possible bug of ivy.....

2016-03-29 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu closed SPARK-14247. --- Resolution: Not A Problem > Spark does not compile with CDH-5.4.x due to the possible bug of ivy. >

[jira] [Comment Edited] (SPARK-14247) Spark does not compile with CDH-5.4.x due to the possible bug of ivy.....

2016-03-29 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216719#comment-15216719 ] Nan Zhu edited comment on SPARK-14247 at 3/29/16 7:39 PM: -- thanks [~sowen], it

[jira] [Commented] (SPARK-14247) Spark does not compile with CDH-5.4.x due to the possible bug of ivy.....

2016-03-29 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216719#comment-15216719 ] Nan Zhu commented on SPARK-14247: - thanks [~sowen], it seems that change the hadoop.version name solves

[jira] [Updated] (SPARK-14247) Spark does not compile with CDH-5.4.x due to the possible bug of ivy.....

2016-03-29 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-14247: Priority: Minor (was: Major) > Spark does not compile with CDH-5.4.x due to the possible bug of ivy.

[jira] [Commented] (SPARK-14247) Spark does not compile with CDH-5.4.x due to the possible bug of ivy.....

2016-03-29 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216661#comment-15216661 ] Nan Zhu commented on SPARK-14247: - [~srowen] I always blindly copied "CDH.*" string from Spark building

[jira] [Created] (SPARK-14247) Spark does not compile with CDH-5.4.x due to the possible bug of ivy.....

2016-03-29 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-14247: --- Summary: Spark does not compile with CDH-5.4.x due to the possible bug of ivy. Key: SPARK-14247 URL: https://issues.apache.org/jira/browse/SPARK-14247 Project: Spark

[jira] [Commented] (SPARK-8547) xgboost exploration

2016-03-15 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195682#comment-15195682 ] Nan Zhu commented on SPARK-8547: FYI, we released a solution to integrate XGBoost with Spark directly

[jira] [Commented] (SPARK-13868) Random forest accuracy exploration

2016-03-15 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195686#comment-15195686 ] Nan Zhu commented on SPARK-13868: - FYI, we released a solution to integrate XGBoost with Spark directly

[jira] [Created] (SPARK-13227) Risky apply() in OpenHashMap

2016-02-06 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-13227: --- Summary: Risky apply() in OpenHashMap Key: SPARK-13227 URL: https://issues.apache.org/jira/browse/SPARK-13227 Project: Spark Issue Type: Bug Components:

[jira] [Commented] (SPARK-12786) Actor demo does not demonstrate usable code

2016-01-13 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096187#comment-15096187 ] Nan Zhu commented on SPARK-12786: - the only place it relies on AkkaUtil is to create an ActorSystem,

[jira] [Commented] (SPARK-12713) UI Executor page should keep links around to executors that died

2016-01-08 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089884#comment-15089884 ] Nan Zhu commented on SPARK-12713: - I attached a PR and two duplicate JIRAs which are addressing the same

[jira] [Commented] (SPARK-12469) Consistent Accumulators for Spark

2015-12-25 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071799#comment-15071799 ] Nan Zhu commented on SPARK-12469: - Just to bring the previous discussions about the topic here,

[jira] [Comment Edited] (SPARK-12469) Consistent Accumulators for Spark

2015-12-25 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071799#comment-15071799 ] Nan Zhu edited comment on SPARK-12469 at 12/26/15 2:44 AM: --- Just to bring the

[jira] [Commented] (SPARK-12237) Unsupported message RpcMessage causes message retries

2015-12-10 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051499#comment-15051499 ] Nan Zhu commented on SPARK-12237: - if that's the case, I don't think it would happen in the real world

[jira] [Commented] (SPARK-12237) Unsupported message RpcMessage causes message retries

2015-12-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048651#comment-15048651 ] Nan Zhu commented on SPARK-12237: - may I ask how you found this issue? It seems that Master received

[jira] [Commented] (SPARK-12229) How to Perform spark submit of application written in scala from Node js

2015-12-09 Thread Nan Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048663#comment-15048663 ] Nan Zhu commented on SPARK-12229: - https://github.com/spark-jobserver/spark-jobserver might be a good

  1   2   3   >