[jira] [Updated] (SPARK-35865) Remove await (syncMode) in ChunkFetchRequestHandler

2021-06-23 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-35865: Description: SPARK-24355 introduces syncMode to mitigate the issue of sasl timeout by throting

[jira] [Updated] (SPARK-35865) Remove await (syncMode) in ChunkFetchRequestHandler

2021-06-23 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-35865: Description: SPARK-24355 introduces syncMode to mitigate the issue of sasl timeout by throting

[jira] [Updated] (SPARK-35865) Remove await (syncMode) in ChunkFetchRequestHandler

2021-06-23 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-35865: Attachment: openblock-compare.png > Remove await (syncMode) in ChunkFetchRequestHandler >

[jira] [Created] (SPARK-35865) Remove await (syncMode) in ChunkFetchRequestHandler

2021-06-23 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-35865: --- Summary: Remove await (syncMode) in ChunkFetchRequestHandler Key: SPARK-35865 URL: https://issues.apache.org/jira/browse/SPARK-35865 Project: Spark Issue

[jira] [Updated] (SPARK-35865) Remove await (syncMode) in ChunkFetchRequestHandler

2021-06-23 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-35865: Attachment: openblock.png > Remove await (syncMode) in ChunkFetchRequestHandler >

[jira] [Created] (SPARK-35010) nestedSchemaPruning causes issue when reading hive generated Orc files

2021-04-09 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-35010: --- Summary: nestedSchemaPruning causes issue when reading hive generated Orc files Key: SPARK-35010 URL: https://issues.apache.org/jira/browse/SPARK-35010 Project: Spark

[jira] [Commented] (SPARK-34779) ExecutorMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

2021-03-31 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312498#comment-17312498 ] Baohe Zhang commented on SPARK-34779: - Thanks for pointing it out! I didn't aware that task peak

[jira] [Updated] (SPARK-34845) ProcfsMetricsGetter.computeAllMetrics may return partial metrics when some of child pids metrics are missing

2021-03-23 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34845: Description: When the procfs metrics of some child pids are unavailable,

[jira] [Updated] (SPARK-34845) ProcfsMetricsGetter.computeAllMetrics may return partial metrics when some of child pids metrics are missing

2021-03-23 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34845: Description: When the procfs metrics of some child pids are unavailable,

[jira] [Updated] (SPARK-34845) ProcfsMetricsGetter.computeAllMetrics may return partial metrics when some of child pids metrics are missing

2021-03-23 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34845: Description: When the procfs metrics of some child pids are unavailable,

[jira] [Updated] (SPARK-34845) ProcfsMetricsGetter.computeAllMetrics shouldn't return partial metrics when some of child pids metrics are missing

2021-03-23 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34845: Description: When the procfs metrics of some child pids are unavailable,

[jira] [Created] (SPARK-34845) ProcfsMetricsGetter.computeAllMetrics shouldn't return partial metrics when some of child pids metrics are missing

2021-03-23 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-34845: --- Summary: ProcfsMetricsGetter.computeAllMetrics shouldn't return partial metrics when some of child pids metrics are missing Key: SPARK-34845 URL:

[jira] [Created] (SPARK-34779) ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

2021-03-17 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-34779: --- Summary: ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs Key: SPARK-34779 URL: https://issues.apache.org/jira/browse/SPARK-34779

[jira] [Commented] (SPARK-32924) Web UI sort on duration is wrong

2021-03-04 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295639#comment-17295639 ] Baohe Zhang commented on SPARK-32924: - [~dongjoon] This is my Jira id. > Web UI sort on duration is

[jira] [Commented] (SPARK-34545) PySpark Python UDF return inconsistent results when applying 2 UDFs with different return type to 2 columns together

2021-02-26 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291954#comment-17291954 ] Baohe Zhang commented on SPARK-34545: - A simpler code to reproduce the error: {code:python} >>> from

[jira] [Commented] (SPARK-34545) PySpark Python UDF return inconsistent results when applying 2 UDFs with different return type to 2 columns together

2021-02-26 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291896#comment-17291896 ] Baohe Zhang commented on SPARK-34545: - This is a correctness bug, so I would like to raise the

[jira] [Updated] (SPARK-34545) PySpark Python UDF return inconsistent results when applying 2 UDFs with different return type to 2 columns together

2021-02-26 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34545: Priority: Blocker (was: Critical) > PySpark Python UDF return inconsistent results when applying

[jira] [Updated] (SPARK-34545) PySpark Python UDF return inconsistent results when applying 2 UDFs with different return type to 2 columns together

2021-02-25 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34545: Summary: PySpark Python UDF return inconsistent results when applying 2 UDFs with different

[jira] [Updated] (SPARK-34545) PySpark Python UDF return inconsistent results when applying UDFs to 2 columns together

2021-02-25 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34545: Description: Python UDF returns inconsistent results between evaluating 2 columns together and

[jira] [Created] (SPARK-34545) PySpark Python UDF return inconsistent results when applying UDFs to 2 columns together

2021-02-25 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-34545: --- Summary: PySpark Python UDF return inconsistent results when applying UDFs to 2 columns together Key: SPARK-34545 URL: https://issues.apache.org/jira/browse/SPARK-34545

[jira] [Updated] (SPARK-34545) PySpark Python UDF return inconsistent results when applying UDFs to 2 columns together

2021-02-25 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34545: Priority: Critical (was: Major) > PySpark Python UDF return inconsistent results when applying

[jira] [Updated] (SPARK-34545) PySpark Python UDF return inconsistent results when applying UDFs to 2 columns together

2021-02-25 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34545: Component/s: SQL > PySpark Python UDF return inconsistent results when applying UDFs to 2 >

[jira] [Updated] (SPARK-34545) PySpark Python UDF return inconsistent results when applying UDFs to 2 columns together

2021-02-25 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34545: Description: Python UDF returns inconsistent results between evaluating 2 columns together and

[jira] [Commented] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277485#comment-17277485 ] Baohe Zhang commented on SPARK-34336: - Full benchmark results are added as txt attachments. > Use

[jira] [Updated] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34336: Attachment: generic_data_read.txt > Use GenericData as Avro serialization data model can improve

[jira] [Updated] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34336: Attachment: base_read.txt > Use GenericData as Avro serialization data model can improve Avro

[jira] [Updated] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34336: Attachment: generic_data_write.txt > Use GenericData as Avro serialization data model can improve

[jira] [Updated] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34336: Attachment: read_comparison.png > Use GenericData as Avro serialization data model can improve

[jira] [Commented] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277483#comment-17277483 ] Baohe Zhang commented on SPARK-34336: - Column chart comparison on avg time: Avro write:

[jira] [Updated] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34336: Attachment: base_write.txt > Use GenericData as Avro serialization data model can improve Avro

[jira] [Updated] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-34336: Attachment: write_comparison.png > Use GenericData as Avro serialization data model can improve

[jira] [Created] (SPARK-34336) Use GenericData as Avro serialization data model can improve Avro write/read performance

2021-02-02 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-34336: --- Summary: Use GenericData as Avro serialization data model can improve Avro write/read performance Key: SPARK-34336 URL: https://issues.apache.org/jira/browse/SPARK-34336

[jira] [Commented] (SPARK-33031) scheduler with blacklisting doesn't appear to pick up new executor added

2020-12-28 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255685#comment-17255685 ] Baohe Zhang commented on SPARK-33031: - New tasks won't be scheduled because the node is marked as

[jira] [Commented] (SPARK-33029) Standalone mode blacklist executors page UI marks driver as blacklisted

2020-12-28 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255682#comment-17255682 ] Baohe Zhang commented on SPARK-33029: - With the blacklist feature enabled, by default, a node will

[jira] [Commented] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254669#comment-17254669 ] Baohe Zhang commented on SPARK-33906: - The more underlay reason seems to be that the stage complete

[jira] [Commented] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254668#comment-17254668 ] Baohe Zhang commented on SPARK-33906: - [~dongjoon] Yes. > SPARK UI Executors page stuck when

[jira] [Updated] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-33906: Description: How to reproduce it? In mac OS standalone mode, open a spark-shell and run

[jira] [Commented] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254647#comment-17254647 ] Baohe Zhang commented on SPARK-33906: - I will put a PR soon. > SPARK UI Executors page stuck when

[jira] [Updated] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-33906: Description: How to reproduce it? In mac OS standalone mode, open a spark-shell and run

[jira] [Updated] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-33906: Attachment: executor-page.png > SPARK UI Executors page stuck when

[jira] [Created] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-33906: --- Summary: SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset Key: SPARK-33906 URL: https://issues.apache.org/jira/browse/SPARK-33906 Project:

[jira] [Commented] (SPARK-26399) Add new stage-level REST APIs and parameters

2020-12-04 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244341#comment-17244341 ] Baohe Zhang commented on SPARK-26399: - Seems the work of this ticket is already done by

[jira] [Created] (SPARK-33215) Speed up event log download by skipping UI rebuild

2020-10-21 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-33215: --- Summary: Speed up event log download by skipping UI rebuild Key: SPARK-33215 URL: https://issues.apache.org/jira/browse/SPARK-33215 Project: Spark Issue Type:

[jira] [Updated] (SPARK-32350) Add batch write support on LevelDB to improve performance of HybridStore

2020-07-17 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-32350: Description: The idea is to improve the performance of HybridStore by adding batch write support

[jira] [Created] (SPARK-32350) Add batch write support on LevelDB to improve performance of HybridStore

2020-07-17 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-32350: --- Summary: Add batch write support on LevelDB to improve performance of HybridStore Key: SPARK-32350 URL: https://issues.apache.org/jira/browse/SPARK-32350 Project:

[jira] [Created] (SPARK-31664) Race in YARN scheduler shutdown leads to uncaught SparkException "Could not find CoarseGrainedScheduler"

2020-05-08 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-31664: --- Summary: Race in YARN scheduler shutdown leads to uncaught SparkException "Could not find CoarseGrainedScheduler" Key: SPARK-31664 URL:

[jira] [Updated] (SPARK-31608) Add a hybrid KVStore to make UI loading faster

2020-04-30 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-31608: Description: This is a follow-up for the work done by Hieu Huynh in 2019. Add a new class

[jira] [Updated] (SPARK-31608) Add a hybrid KVStore to make UI loading faster

2020-04-29 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-31608: Description: This is a follow-up for the work done by Hieu Huynh in 2019. Add a new class

[jira] [Created] (SPARK-31608) Add a hybrid KVStore to make UI loading faster

2020-04-29 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-31608: --- Summary: Add a hybrid KVStore to make UI loading faster Key: SPARK-31608 URL: https://issues.apache.org/jira/browse/SPARK-31608 Project: Spark Issue Type:

[jira] [Updated] (SPARK-31584) NullPointerException when parsing event log with InMemoryStore

2020-04-27 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-31584: Attachment: errorstack.txt > NullPointerException when parsing event log with InMemoryStore >

[jira] [Created] (SPARK-31584) NullPointerException when parsing event log with InMemoryStore

2020-04-27 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-31584: --- Summary: NullPointerException when parsing event log with InMemoryStore Key: SPARK-31584 URL: https://issues.apache.org/jira/browse/SPARK-31584 Project: Spark

[jira] [Commented] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-15 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084431#comment-17084431 ] Baohe Zhang commented on SPARK-31380: - Did you run your application with Spark 3? I tested it in my

[jira] [Updated] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-15 Thread Baohe Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Baohe Zhang updated SPARK-31380: Attachment: image-2020-04-15-18-16-18-254.png > Peak Execution Memory Quantile is not displayed