[jira] [Created] (SPARK-11765) Avoid assign UI port between browser unsafe ports (or just 4045: lockd)

2015-11-16 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-11765: Summary: Avoid assign UI port between browser unsafe ports (or just 4045: lockd) Key: SPARK-11765 URL: https://issues.apache.org/jira/browse/SPARK-11765 Project:

[jira] [Commented] (SPARK-11818) REPL ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011168#comment-15011168 ] Jungtaek Lim commented on SPARK-11818: -- Yes, assembly jar has HBase libs, and unzipped assembly jar

[jira] [Comment Edited] (SPARK-11818) REPL ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011202#comment-15011202 ] Jungtaek Lim edited comment on SPARK-11818 at 11/18/15 3:30 PM: One thing

[jira] [Commented] (SPARK-11818) REPL ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011202#comment-15011202 ] Jungtaek Lim commented on SPARK-11818: -- One thing to note is, if I remove "spark.repl.class.uri"

[jira] [Commented] (SPARK-11818) ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011223#comment-15011223 ] Jungtaek Lim commented on SPARK-11818: -- Removed "REPL" from title since it could confuse us to

[jira] [Updated] (SPARK-11818) ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-11818: - Summary: ExecutorClassLoader cannot see any resources from parent class loader (was: REPL

[jira] [Comment Edited] (SPARK-11818) REPL ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011168#comment-15011168 ] Jungtaek Lim edited comment on SPARK-11818 at 11/18/15 3:18 PM: Yes,

[jira] [Comment Edited] (SPARK-11818) REPL ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011168#comment-15011168 ] Jungtaek Lim edited comment on SPARK-11818 at 11/18/15 3:19 PM: Yes,

[jira] [Updated] (SPARK-11818) REPL ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-11818: - Affects Version/s: 1.4.1 > REPL ExecutorClassLoader cannot see any resources from parent class

[jira] [Updated] (SPARK-11818) REPL ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-11818: - Component/s: (was: Spark Core) Spark Shell > REPL ExecutorClassLoader

[jira] [Created] (SPARK-11818) REPL ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-11818: Summary: REPL ExecutorClassLoader cannot see any resources from parent class loader Key: SPARK-11818 URL: https://issues.apache.org/jira/browse/SPARK-11818 Project:

[jira] [Commented] (SPARK-11818) ExecutorClassLoader cannot see any resources from parent class loader

2015-11-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012591#comment-15012591 ] Jungtaek Lim commented on SPARK-11818: -- More information: I resolved the origin issue I affected via

[jira] [Created] (SPARK-24336) Support 'pass through' transformation in BasicOperators

2018-05-21 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24336: Summary: Support 'pass through' transformation in BasicOperators Key: SPARK-24336 URL: https://issues.apache.org/jira/browse/SPARK-24336 Project: Spark

[jira] [Updated] (SPARK-24311) Refactor HDFSBackedStateStoreProvider to remove duplicated logic between operations on delta file and snapshot file

2018-05-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-24311: - Description: The structure of delta file and snapshot file is same, but the operations are

[jira] [Updated] (SPARK-24311) Refactor HDFSBackedStateStoreProvider to remove duplicated logic between operations on delta file and snapshot file

2018-05-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-24311: - Summary: Refactor HDFSBackedStateStoreProvider to remove duplicated logic between operations on

[jira] [Updated] (SPARK-24311) Refactor HDFSBackedStateStoreProvide to remove duplicated logic between operations on delta file and snapshot file

2018-05-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-24311: - Summary: Refactor HDFSBackedStateStoreProvide to remove duplicated logic between operations on

[jira] [Created] (SPARK-24311) Refactor HDFSBackedStateStoreProvide to remove duplicated logic between operations on delta file and operations on snapshot file

2018-05-17 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24311: Summary: Refactor HDFSBackedStateStoreProvide to remove duplicated logic between operations on delta file and operations on snapshot file Key: SPARK-24311 URL:

[jira] [Created] (SPARK-24485) Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider

2018-06-07 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24485: Summary: Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider Key: SPARK-24485 URL: https://issues.apache.org/jira/browse/SPARK-24485

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-06-05 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501334#comment-16501334 ] Jungtaek Lim commented on SPARK-18057: -- Is Kafka 2.0.0 client compatible with Kafka 1.x and 0.10.x

[jira] [Created] (SPARK-24466) TextSocketMicroBatchReader no longer works with nc utility

2018-06-04 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24466: Summary: TextSocketMicroBatchReader no longer works with nc utility Key: SPARK-24466 URL: https://issues.apache.org/jira/browse/SPARK-24466 Project: Spark

[jira] [Commented] (SPARK-24466) TextSocketMicroBatchReader no longer works with nc utility

2018-06-04 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501216#comment-16501216 ] Jungtaek Lim commented on SPARK-24466: -- I'm working on this. Will provide the patch sooner. >

[jira] [Created] (SPARK-24441) Expose total size of states in HDFSBackedStateStoreProvider

2018-05-31 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24441: Summary: Expose total size of states in HDFSBackedStateStoreProvider Key: SPARK-24441 URL: https://issues.apache.org/jira/browse/SPARK-24441 Project: Spark

[jira] [Updated] (SPARK-24441) Expose total estimated size of states in HDFSBackedStateStoreProvider

2018-06-02 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-24441: - Summary: Expose total estimated size of states in HDFSBackedStateStoreProvider (was: Expose

[jira] [Updated] (SPARK-24441) Expose total size of states in HDFSBackedStateStoreProvider

2018-06-02 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-24441: - Description: While Spark exposes state metrics for single state, Spark still doesn't expose

[jira] [Commented] (SPARK-24634) Add a new metric regarding number of rows later than watermark

2018-06-22 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520948#comment-16520948 ] Jungtaek Lim commented on SPARK-24634: -- Working on this. Will submit a patch soon. > Add a new

[jira] [Created] (SPARK-24634) Add a new metric regarding number of rows later than watermark

2018-06-22 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24634: Summary: Add a new metric regarding number of rows later than watermark Key: SPARK-24634 URL: https://issues.apache.org/jira/browse/SPARK-24634 Project: Spark

[jira] [Created] (SPARK-24637) Add metrics regarding state and watermark to dropwizard metrics

2018-06-23 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24637: Summary: Add metrics regarding state and watermark to dropwizard metrics Key: SPARK-24637 URL: https://issues.apache.org/jira/browse/SPARK-24637 Project: Spark

[jira] [Updated] (SPARK-24637) Add metrics regarding state and watermark to dropwizard metrics

2018-06-23 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-24637: - Description: Though Spark provides option to enable stream metrics into Dropwizard, it only

[jira] [Resolved] (SPARK-24336) Support 'pass through' transformation in BasicOperators

2018-06-19 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-24336. -- Resolution: Invalid > Support 'pass through' transformation in BasicOperators >

[jira] [Commented] (SPARK-24717) Split out min retain version of state for memory in HDFSBackedStateStoreProvider

2018-07-02 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529962#comment-16529962 ] Jungtaek Lim commented on SPARK-24717: -- I have a patch for this, but now on top of SPARK-24441.

[jira] [Created] (SPARK-24717) Split out min retain version of state for memory in HDFSBackedStateStoreProvider

2018-07-02 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24717: Summary: Split out min retain version of state for memory in HDFSBackedStateStoreProvider Key: SPARK-24717 URL: https://issues.apache.org/jira/browse/SPARK-24717

[jira] [Commented] (SPARK-24161) Enable debug package feature on structured streaming

2018-05-02 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461713#comment-16461713 ] Jungtaek Lim commented on SPARK-24161: -- I have a working patch. Will raise a PR sooner. > Enable

[jira] [Created] (SPARK-24161) Enable debug package feature on structured streaming

2018-05-02 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24161: Summary: Enable debug package feature on structured streaming Key: SPARK-24161 URL: https://issues.apache.org/jira/browse/SPARK-24161 Project: Spark Issue

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-05-03 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462149#comment-16462149 ] Jungtaek Lim commented on SPARK-10816: -- I'm still curious about out-of-box support on session

[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-03 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463189#comment-16463189 ] Jungtaek Lim commented on SPARK-23703: -- Actually I haven't hear about multiple watermarks on same

[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-03 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463209#comment-16463209 ] Jungtaek Lim commented on SPARK-23703: -- Agreed. Is it worth to discuss in dev. mailing list? Or we

[jira] [Commented] (SPARK-21429) show on structured Dataset is equivalent to writeStream to console once

2018-05-03 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462225#comment-16462225 ] Jungtaek Lim commented on SPARK-21429: -- I agree that shortcut would help, but a bit afraid that such

[jira] [Commented] (SPARK-23703) Collapse sequential watermarks

2018-05-03 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462339#comment-16462339 ] Jungtaek Lim commented on SPARK-23703: -- [~joseph.torres] Could you provide simple code or query

[jira] [Resolved] (SPARK-24311) Refactor HDFSBackedStateStoreProvider to remove duplicated logic between operations on delta file and snapshot file

2018-07-31 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-24311. -- Resolution: Won't Fix As I got feedback on

[jira] [Created] (SPARK-24995) Flaky tests: FlatMapGroupsWithStateSuite.flatMapGroupsWithState - streaming with processing time timeout

2018-08-01 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24995: Summary: Flaky tests: FlatMapGroupsWithStateSuite.flatMapGroupsWithState - streaming with processing time timeout Key: SPARK-24995 URL:

[jira] [Commented] (SPARK-20568) Delete files after processing in structured streaming

2018-08-15 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581979#comment-16581979 ] Jungtaek Lim commented on SPARK-20568: -- For me, the feature looks like the missing spot for

[jira] [Commented] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-08-21 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588389#comment-16588389 ] Jungtaek Lim commented on SPARK-24763: -- [~tdas] Got it. Thanks for the input. > Remove redundant

[jira] [Commented] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-08-21 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588101#comment-16588101 ] Jungtaek Lim commented on SPARK-24763: -- [~tdas] One question regarding fix version: I guess we

[jira] [Comment Edited] (SPARK-25106) A new Kafka consumer gets created for every batch

2018-08-24 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591278#comment-16591278 ] Jungtaek Lim edited comment on SPARK-25106 at 8/24/18 7:46 AM: --- I played

[jira] [Commented] (SPARK-25106) A new Kafka consumer gets created for every batch

2018-08-24 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591278#comment-16591278 ] Jungtaek Lim commented on SPARK-25106: -- I played with the project and looks like it is affected by 

[jira] [Created] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

2018-08-26 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-25245: Summary: Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming Key: SPARK-25245 URL:

[jira] [Updated] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-25151: - Environment: (was: KafkaDataConsumer contains its own logic for caching

[jira] [Created] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-25151: Summary: Apply Apache Commons Pool to KafkaDataConsumer Key: SPARK-25151 URL: https://issues.apache.org/jira/browse/SPARK-25151 Project: Spark Issue Type:

[jira] [Updated] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-25151: - Description: KafkaDataConsumer contains its own logic for caching InternalKafkaConsumer which

[jira] [Commented] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584478#comment-16584478 ] Jungtaek Lim commented on SPARK-25151: -- Working on it. Will provide a patch shortly. > Apply

[jira] [Comment Edited] (SPARK-23714) Add metrics for cached KafkaConsumer

2018-08-20 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585587#comment-16585587 ] Jungtaek Lim edited comment on SPARK-23714 at 8/20/18 8:06 AM: ---

[jira] [Commented] (SPARK-23714) Add metrics for cached KafkaConsumer

2018-08-20 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585587#comment-16585587 ] Jungtaek Lim commented on SPARK-23714: -- [~yuzhih...@gmail.com] Maybe we can just apply Apache

[jira] [Commented] (SPARK-23682) Memory issue with Spark structured streaming

2018-07-04 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533078#comment-16533078 ] Jungtaek Lim commented on SPARK-23682: -- [~bondyk] [~ccifuentes] [~akorzhuev] This may not due to

[jira] [Comment Edited] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-07-12 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541367#comment-16541367 ] Jungtaek Lim edited comment on SPARK-24763 at 7/12/18 9:21 AM: --- I had a

[jira] [Comment Edited] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-07-12 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541367#comment-16541367 ] Jungtaek Lim edited comment on SPARK-24763 at 7/12/18 9:20 AM: --- I had a

[jira] [Commented] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-07-12 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541367#comment-16541367 ] Jungtaek Lim commented on SPARK-24763: -- I had a chance to craft various key/value cases (bigger

[jira] [Comment Edited] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-07-12 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541367#comment-16541367 ] Jungtaek Lim edited comment on SPARK-24763 at 7/12/18 1:53 PM: --- I had a

[jira] [Created] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-07-08 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-24763: Summary: Remove redundant key data from value in streaming aggregation Key: SPARK-24763 URL: https://issues.apache.org/jira/browse/SPARK-24763 Project: Spark

[jira] [Comment Edited] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-07-08 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536539#comment-16536539 ] Jungtaek Lim edited comment on SPARK-24763 at 7/9/18 5:19 AM: -- > Spark

[jira] [Comment Edited] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-07-08 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536539#comment-16536539 ] Jungtaek Lim edited comment on SPARK-24763 at 7/9/18 5:18 AM: -- > Spark

[jira] [Commented] (SPARK-24763) Remove redundant key data from value in streaming aggregation

2018-07-08 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536539#comment-16536539 ] Jungtaek Lim commented on SPARK-24763: -- > Spark version * 2.4.0-SNAPSHOT * commit: 

[jira] [Commented] (SPARK-24036) Stateful operators in continuous processing

2018-04-24 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451653#comment-16451653 ] Jungtaek Lim commented on SPARK-24036: -- Hello, I'm quite interested to this issue since I just read

[jira] [Comment Edited] (SPARK-24036) Stateful operators in continuous processing

2018-04-25 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453209#comment-16453209 ] Jungtaek Lim edited comment on SPARK-24036 at 4/25/18 10:54 PM: Maybe

[jira] [Commented] (SPARK-24036) Stateful operators in continuous processing

2018-04-25 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453209#comment-16453209 ] Jungtaek Lim commented on SPARK-24036: -- Maybe better to share what I've observed from continuous

[jira] [Commented] (SPARK-24036) Stateful operators in continuous processing

2018-04-25 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453290#comment-16453290 ] Jungtaek Lim commented on SPARK-24036: -- Btw, I would like to say the idea for iterator hack and

[jira] [Commented] (SPARK-24630) SPIP: Support SQLStreaming in Spark

2018-10-07 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641327#comment-16641327 ] Jungtaek Lim commented on SPARK-24630: -- [~Jackey Lee] For DDL it would be better to participate

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-12 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647816#comment-16647816 ] Jungtaek Lim commented on SPARK-10816: -- I'd add my thought about the approach 2. It would work best

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-16 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651404#comment-16651404 ] Jungtaek Lim commented on SPARK-10816: -- Update: I've crafted another performance test for testing

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-16 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652925#comment-16652925 ] Jungtaek Lim commented on SPARK-10816: -- I've been thinking about requirements on state store for

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-16 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652878#comment-16652878 ] Jungtaek Lim commented on SPARK-10816: -- Update on test results for both data patterns: 1. plenty

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-10-19 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656435#comment-16656435 ] Jungtaek Lim edited comment on SPARK-10816 at 10/19/18 8:24 AM:

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-19 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656435#comment-16656435 ] Jungtaek Lim commented on SPARK-10816: -- [~msukmanowsky] In case if you would like to still

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-10-21 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658599#comment-16658599 ] Jungtaek Lim edited comment on SPARK-10816 at 10/22/18 5:54 AM: Just

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-10-22 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658599#comment-16658599 ] Jungtaek Lim edited comment on SPARK-10816 at 10/22/18 6:04 AM: Just

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-21 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658599#comment-16658599 ] Jungtaek Lim commented on SPARK-10816: -- Just going back to review the origin comment of [~zsxwing].

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-10-22 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658935#comment-16658935 ] Jungtaek Lim edited comment on SPARK-10816 at 10/22/18 12:29 PM: - Have

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-22 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658935#comment-16658935 ] Jungtaek Lim commented on SPARK-10816: -- Have been thinking about [3] but can't find good approach

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-10-20 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656435#comment-16656435 ] Jungtaek Lim edited comment on SPARK-10816 at 10/21/18 12:19 AM: -

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-10-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654627#comment-16654627 ] Jungtaek Lim edited comment on SPARK-10816 at 10/18/18 4:36 AM: Just ran

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654627#comment-16654627 ] Jungtaek Lim commented on SPARK-10816: -- Just ran another performance test to check my new trial of

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654940#comment-16654940 ] Jungtaek Lim commented on SPARK-10816: -- Just updated the code to move out logic to load/store state

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-10-18 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654627#comment-16654627 ] Jungtaek Lim edited comment on SPARK-10816 at 10/18/18 12:42 PM: - Just

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-12 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647923#comment-16647923 ] Jungtaek Lim commented on SPARK-10816: -- Btw, I've just rebuilt with Baidu's updated patch and reran

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-10-24 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662007#comment-16662007 ] Jungtaek Lim edited comment on SPARK-10816 at 10/24/18 9:37 AM: UPDATE:

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-24 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662007#comment-16662007 ] Jungtaek Lim commented on SPARK-10816: -- UPDATE: Since we don't access state session list by index

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-29 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668120#comment-16668120 ] Jungtaek Lim commented on SPARK-10816: -- UPDATE: 1. Added additional new data pattern in benchmark

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-29 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668151#comment-16668151 ] Jungtaek Lim commented on SPARK-10816: -- Let me summarize about new benchmark run: For data pattern

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-10-29 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668157#comment-16668157 ] Jungtaek Lim commented on SPARK-10816: -- I feel we would be OK even we load sessions for a key into

[jira] [Commented] (SPARK-20568) Delete files after processing in structured streaming

2018-10-31 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670904#comment-16670904 ] Jungtaek Lim commented on SPARK-20568: -- [~zsxwing] Yes I'll see the time slot and take a look at

[jira] [Commented] (SPARK-20568) Delete files after processing in structured streaming

2018-11-02 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672805#comment-16672805 ] Jungtaek Lim commented on SPARK-20568: -- [~zsxwing] I've thought about it a bit. I'm not familiar

[jira] [Comment Edited] (SPARK-10816) EventTime based sessionization

2018-11-01 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672478#comment-16672478 ] Jungtaek Lim edited comment on SPARK-10816 at 11/2/18 2:49 AM: --- UPDATE: I

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2018-11-01 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672478#comment-16672478 ] Jungtaek Lim commented on SPARK-10816: -- UPDATE: I just discovered the performance critical issue on

[jira] [Commented] (SPARK-20568) Delete files after processing in structured streaming

2018-11-05 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675833#comment-16675833 ] Jungtaek Lim commented on SPARK-20568: -- FYI, trait 'Source' has 'commit' method which is called

[jira] [Commented] (SPARK-25937) Support user-defined schema in Kafka Source & Sink

2018-11-05 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676098#comment-16676098 ] Jungtaek Lim commented on SPARK-25937: -- Could you share what is the issue while crafting

[jira] [Commented] (SPARK-25937) Support user-defined schema in Kafka Source & Sink

2018-11-04 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674665#comment-16674665 ] Jungtaek Lim commented on SPARK-25937: -- Another thought for my side: maybe we can classify various

[jira] [Commented] (SPARK-25380) Generated plans occupy over 50% of Spark driver memory

2018-09-27 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631295#comment-16631295 ] Jungtaek Lim commented on SPARK-25380: -- IMHO it depends on how we see the issue and how we would

[jira] [Commented] (SPARK-25380) Generated plans occupy over 50% of Spark driver memory

2018-09-27 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631313#comment-16631313 ] Jungtaek Lim commented on SPARK-25380: -- Btw, reproducer still helps when we tackle it with only UI

[jira] [Comment Edited] (SPARK-25380) Generated plans occupy over 50% of Spark driver memory

2018-09-27 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631295#comment-16631295 ] Jungtaek Lim edited comment on SPARK-25380 at 9/28/18 3:41 AM: --- IMHO it

[jira] [Commented] (SPARK-22565) Session-based windowing

2018-09-27 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631301#comment-16631301 ] Jungtaek Lim commented on SPARK-22565: -- [~XuanYuan] Hello, I've been initiating supporting this

[jira] [Commented] (SPARK-24630) SPIP: Support SQLStreaming in Spark

2018-09-28 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631550#comment-16631550 ] Jungtaek Lim commented on SPARK-24630: -- I think it would be better to describe actual queries (any

  1   2   3   4   5   6   7   8   9   10   >