[jira] [Commented] (SPARK-22570) Cast may create a lot of UTF8String.IntWrapper or UTF8String.longWrapper instances

2017-11-20 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260363#comment-16260363 ] Kazuaki Ishizaki commented on SPARK-22570: -- I am working on this with SPARK-22500 > Cast may

[jira] [Created] (SPARK-22571) How to connect to secure(Kerberos) kafka broker using native KafkaConsumer api in Spark Streamming application

2017-11-20 Thread Ujjal Satpathy (JIRA)
Ujjal Satpathy created SPARK-22571: -- Summary: How to connect to secure(Kerberos) kafka broker using native KafkaConsumer api in Spark Streamming application Key: SPARK-22571 URL:

[jira] [Created] (SPARK-22570) Cast may create a lot of UTF8String.IntWrapper or UTF8String.longWrapper instances

2017-11-20 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-22570: Summary: Cast may create a lot of UTF8String.IntWrapper or UTF8String.longWrapper instances Key: SPARK-22570 URL: https://issues.apache.org/jira/browse/SPARK-22570

[jira] [Assigned] (SPARK-22569) Clean up caller of splitExpressions and addMutableState

2017-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22569: Assignee: Apache Spark (was: Xiao Li) > Clean up caller of splitExpressions and

[jira] [Assigned] (SPARK-22569) Clean up caller of splitExpressions and addMutableState

2017-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22569: Assignee: Xiao Li (was: Apache Spark) > Clean up caller of splitExpressions and

[jira] [Commented] (SPARK-22569) Clean up caller of splitExpressions and addMutableState

2017-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260284#comment-16260284 ] Apache Spark commented on SPARK-22569: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Created] (SPARK-22569) Clean up caller of splitExpressions and addMutableState

2017-11-20 Thread Xiao Li (JIRA)
Xiao Li created SPARK-22569: --- Summary: Clean up caller of splitExpressions and addMutableState Key: SPARK-22569 URL: https://issues.apache.org/jira/browse/SPARK-22569 Project: Spark Issue Type:

[jira] [Commented] (SPARK-22556) WrappedArray with Explode Function create WrappedArray with 1 object.

2017-11-20 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260241#comment-16260241 ] Hyukjin Kwon commented on SPARK-22556: -- So, it looks showing the results as below: {code} scala>

[jira] [Updated] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22541: Component/s: (was: Documentation) > Dataframes: applying multiple filters one after

[jira] [Commented] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260240#comment-16260240 ] Liang-Chi Hsieh commented on SPARK-22541: - Since this is known behavior, I will change this from

[jira] [Updated] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22541: Issue Type: Documentation (was: Bug) > Dataframes: applying multiple filters one after

[jira] [Updated] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22541: Component/s: Documentation > Dataframes: applying multiple filters one after another using

[jira] [Resolved] (SPARK-22568) Split pair RDDs by keys - an efficient (maybe?) substitute to groupByKey

2017-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22568. --- Resolution: Not A Problem I think this is more of a usage question, so belongs on the mailing list.

[jira] [Updated] (SPARK-22568) Split pair RDDs by keys - an efficient (maybe?) substitute to groupByKey

2017-11-20 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-22568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Éderson Cássio updated SPARK-22568: --- Description: Sorry for any mistakes on filling this big form... it's my first issue here :)

[jira] [Updated] (SPARK-22568) Split pair RDDs by keys - an efficient (maybe?) substitute to groupByKey

2017-11-20 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-22568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Éderson Cássio updated SPARK-22568: --- Description: Sorry for any mistakes on filling this big form... it's my first issue here :)

[jira] [Created] (SPARK-22568) Split pair RDDs by keys - an efficient (maybe?) substitute to groupByKey

2017-11-20 Thread JIRA
Éderson Cássio created SPARK-22568: -- Summary: Split pair RDDs by keys - an efficient (maybe?) substitute to groupByKey Key: SPARK-22568 URL: https://issues.apache.org/jira/browse/SPARK-22568

[jira] [Comment Edited] (SPARK-21322) support histogram in filter cardinality estimation

2017-11-20 Thread Ron Hu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258601#comment-16258601 ] Ron Hu edited comment on SPARK-21322 at 11/21/17 1:43 AM: -- Pull request 19357

[jira] [Assigned] (SPARK-22549) 64KB JVM bytecode limit problem with concat_ws

2017-11-20 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22549: --- Assignee: Kazuaki Ishizaki > 64KB JVM bytecode limit problem with concat_ws >

[jira] [Resolved] (SPARK-22549) 64KB JVM bytecode limit problem with concat_ws

2017-11-20 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22549. - Resolution: Fixed Fix Version/s: 2.3.0 2.2.2 Issue resolved by pull

[jira] [Commented] (SPARK-20133) User guide for spark.ml.stat.ChiSquareTest

2017-11-20 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260095#comment-16260095 ] Teng Peng commented on SPARK-20133: --- I believe the documentation, including user guide and example

[jira] [Resolved] (SPARK-22449) Add BIC for GLM

2017-11-20 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng resolved SPARK-22449. --- Resolution: Later > Add BIC for GLM > --- > > Key: SPARK-22449 >

[jira] [Commented] (SPARK-12748) Failed to create HiveContext in SparkSql

2017-11-20 Thread Jeffrey E Rodriguez (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260023#comment-16260023 ] Jeffrey E Rodriguez commented on SPARK-12748: -- Sorry Ajesh, how would setting a remote

[jira] [Created] (SPARK-22567) spark.mesos.executor.memoryOverhead equivalent for the Driver when running on Mesos

2017-11-20 Thread Michael Moss (JIRA)
Michael Moss created SPARK-22567: Summary: spark.mesos.executor.memoryOverhead equivalent for the Driver when running on Mesos Key: SPARK-22567 URL: https://issues.apache.org/jira/browse/SPARK-22567

[jira] [Commented] (SPARK-21097) Dynamic allocation will preserve cached data

2017-11-20 Thread Arun Suresh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259789#comment-16259789 ] Arun Suresh commented on SPARK-21097: - Thanks for raising this. I would also like to call attention

[jira] [Updated] (SPARK-22564) csv reader no longer logs errors

2017-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-22564: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) I don't see it used anywhere,

[jira] [Commented] (SPARK-22566) Better error message for `_merge_type` in Pandas to Spark DF conversion

2017-11-20 Thread Guilherme Berger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259679#comment-16259679 ] Guilherme Berger commented on SPARK-22566: -- Relevant:

[jira] [Created] (SPARK-22566) Better error message for `_merge_type` in Pandas to Spark DF conversion

2017-11-20 Thread Guilherme Berger (JIRA)
Guilherme Berger created SPARK-22566: Summary: Better error message for `_merge_type` in Pandas to Spark DF conversion Key: SPARK-22566 URL: https://issues.apache.org/jira/browse/SPARK-22566

[jira] [Created] (SPARK-22565) Session-based windowing

2017-11-20 Thread Richard Xin (JIRA)
Richard Xin created SPARK-22565: --- Summary: Session-based windowing Key: SPARK-22565 URL: https://issues.apache.org/jira/browse/SPARK-22565 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-22427) StackOverFlowError when using FPGrowth

2017-11-20 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259587#comment-16259587 ] yuhao yang commented on SPARK-22427: I tried with larger scale data but did not repro the issue.

[jira] [Commented] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Ben (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259517#comment-16259517 ] Ben commented on SPARK-22563: - I cannot give you the actual data, but in this case it would be e.g.:

[jira] [Assigned] (SPARK-22562) CachedKafkaConsumer unsafe eviction from cache

2017-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22562: Assignee: (was: Apache Spark) > CachedKafkaConsumer unsafe eviction from cache >

[jira] [Assigned] (SPARK-22562) CachedKafkaConsumer unsafe eviction from cache

2017-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22562: Assignee: Apache Spark > CachedKafkaConsumer unsafe eviction from cache >

[jira] [Commented] (SPARK-22562) CachedKafkaConsumer unsafe eviction from cache

2017-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259489#comment-16259489 ] Apache Spark commented on SPARK-22562: -- User 'daroo' has created a pull request for this issue:

[jira] [Created] (SPARK-22564) csv reader no longer logs errors

2017-11-20 Thread Adrian Bridgett (JIRA)
Adrian Bridgett created SPARK-22564: --- Summary: csv reader no longer logs errors Key: SPARK-22564 URL: https://issues.apache.org/jira/browse/SPARK-22564 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-22558) SparkHiveDynamicPartition fails when trying to write data from kafka to hive using spark streaming

2017-11-20 Thread KhajaAsmath Mohammed (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259472#comment-16259472 ] KhajaAsmath Mohammed commented on SPARK-22558: -- any suggestions for this issue? >

[jira] [Commented] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259465#comment-16259465 ] Sean Owen commented on SPARK-22563: --- Can you give an example of the output you see? > Spark

[jira] [Updated] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Ben (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben updated SPARK-22563: Description: I have a large and complex DataFrame with nested structures in Spark 2.1.0 (pySpark) and I want to

[jira] [Comment Edited] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Ben (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259367#comment-16259367 ] Ben edited comment on SPARK-22563 at 11/20/17 3:34 PM: --- Hi [~srowen], The example

[jira] [Comment Edited] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Ben (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259367#comment-16259367 ] Ben edited comment on SPARK-22563 at 11/20/17 3:34 PM: --- Hi [~srowen], The example

[jira] [Commented] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Ben (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259367#comment-16259367 ] Ben commented on SPARK-22563: - The example is what I would logically expect, but in reality, the ID column

[jira] [Commented] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259324#comment-16259324 ] Sean Owen commented on SPARK-22563: --- I'm not seeing the difference you refer to. The IDs remain the

[jira] [Updated] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Ben (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben updated SPARK-22563: Component/s: Shuffle PySpark > Spark row_number() deterministic generation and materialization as

[jira] [Created] (SPARK-22563) Spark row_number() deterministic generation and materialization as a checkpoint

2017-11-20 Thread Ben (JIRA)
Ben created SPARK-22563: --- Summary: Spark row_number() deterministic generation and materialization as a checkpoint Key: SPARK-22563 URL: https://issues.apache.org/jira/browse/SPARK-22563 Project: Spark

[jira] [Created] (SPARK-22562) CachedKafkaConsumer unsafe eviction from cache

2017-11-20 Thread Dariusz Szablinski (JIRA)
Dariusz Szablinski created SPARK-22562: -- Summary: CachedKafkaConsumer unsafe eviction from cache Key: SPARK-22562 URL: https://issues.apache.org/jira/browse/SPARK-22562 Project: Spark

[jira] [Commented] (SPARK-22560) Must create spark session directly to connect to hive

2017-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259209#comment-16259209 ] Sean Owen commented on SPARK-22560: --- I think you need to set these before creating the context? > Must

[jira] [Updated] (SPARK-22516) CSV Read breaks: When "multiLine" = "true", if "comment" option is set as last line's first character

2017-11-20 Thread Kumaresh C R (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kumaresh C R updated SPARK-22516: - Attachment: test_file_without_eof_char.csv > CSV Read breaks: When "multiLine" = "true", if

[jira] [Commented] (SPARK-22516) CSV Read breaks: When "multiLine" = "true", if "comment" option is set as last line's first character

2017-11-20 Thread Kumaresh C R (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259157#comment-16259157 ] Kumaresh C R commented on SPARK-22516: -- [~mgaido]: Even after I replaced all 'CR LF' to 'LF', still

[jira] [Assigned] (SPARK-22533) SparkConfigProvider does not handle deprecated config keys

2017-11-20 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22533: --- Assignee: Marcelo Vanzin > SparkConfigProvider does not handle deprecated config keys >

[jira] [Resolved] (SPARK-22533) SparkConfigProvider does not handle deprecated config keys

2017-11-20 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22533. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19760

[jira] [Resolved] (SPARK-20101) Use OffHeapColumnVector when "spark.memory.offHeap.enabled" is set to "true"

2017-11-20 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20101. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 17436

[jira] [Assigned] (SPARK-20101) Use OffHeapColumnVector when "spark.sql.columnVector.offheap.enable" is set to "true"

2017-11-20 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-20101: --- Assignee: Kazuaki Ishizaki > Use OffHeapColumnVector when

[jira] [Updated] (SPARK-20101) Use OffHeapColumnVector when "spark.sql.columnVector.offheap.enable" is set to "true"

2017-11-20 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20101: Summary: Use OffHeapColumnVector when "spark.sql.columnVector.offheap.enable" is set to "true"

[jira] [Created] (SPARK-22561) Dynamically update topics list for spark kafka consumer

2017-11-20 Thread Arun (JIRA)
Arun created SPARK-22561: Summary: Dynamically update topics list for spark kafka consumer Key: SPARK-22561 URL: https://issues.apache.org/jira/browse/SPARK-22561 Project: Spark Issue Type: New

[jira] [Updated] (SPARK-22560) Must create spark session directly to connect to hive

2017-11-20 Thread Ran Mingxuan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ran Mingxuan updated SPARK-22560: - Description: In a java project I have to use both JavaSparkContext and SparkSession. I find

[jira] [Updated] (SPARK-22560) Must create spark session directly to connect to hive

2017-11-20 Thread Ran Mingxuan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ran Mingxuan updated SPARK-22560: - Description: In a java project I have to use both JavaSparkContext and SparkSession. I find

[jira] [Updated] (SPARK-22560) Must create spark session directly to connect to hive

2017-11-20 Thread Ran Mingxuan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ran Mingxuan updated SPARK-22560: - Description: I have built a spark job like below: {code:java} // wrong code public void

[jira] [Created] (SPARK-22560) Must create spark session directly to connect to hive

2017-11-20 Thread Ran Mingxuan (JIRA)
Ran Mingxuan created SPARK-22560: Summary: Must create spark session directly to connect to hive Key: SPARK-22560 URL: https://issues.apache.org/jira/browse/SPARK-22560 Project: Spark Issue