[jira] [Created] (SPARK-20464) Add a job group and an informative job description for streaming queries

2017-04-25 Thread Kunal Khamar (JIRA)
Kunal Khamar created SPARK-20464: Summary: Add a job group and an informative job description for streaming queries Key: SPARK-20464 URL: https://issues.apache.org/jira/browse/SPARK-20464 Project:

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983820#comment-15983820 ] Michael Armbrust commented on SPARK-18057: -- I guess I'd like to understand more about what

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-25 Thread Helena Edelson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983832#comment-15983832 ] Helena Edelson commented on SPARK-18057: It is the timeout. I think waiting is better, will be

[jira] [Updated] (SPARK-20456) Add examples for functions collection for pyspark

2017-04-25 Thread Michael Patterson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Patterson updated SPARK-20456: -- Summary: Add examples for functions collection for pyspark (was: Document major

[jira] [Updated] (SPARK-20456) Add examples for functions collection for pyspark

2017-04-25 Thread Michael Patterson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Patterson updated SPARK-20456: -- Description: Document `sql.functions.py`: 1. Add examples for the common aggregate

[jira] [Updated] (SPARK-20456) Add examples for functions collection for pyspark

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-20456: - Component/s: PySpark > Add examples for functions collection for pyspark >

[jira] [Updated] (SPARK-18127) Add hooks and extension points to Spark

2017-04-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-18127: Fix Version/s: 2.2.0 > Add hooks and extension points to Spark > --- >

[jira] [Resolved] (SPARK-18127) Add hooks and extension points to Spark

2017-04-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-18127. - Resolution: Fixed > Add hooks and extension points to Spark > --- >

[jira] [Commented] (SPARK-18127) Add hooks and extension points to Spark

2017-04-25 Thread Frederick Reiss (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983872#comment-15983872 ] Frederick Reiss commented on SPARK-18127: - Is there a design document or a public design and

[jira] [Commented] (SPARK-20456) Document major aggregation functions for pyspark

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983888#comment-15983888 ] Hyukjin Kwon commented on SPARK-20456: -- I simply left the comment above as the current status does

[jira] [Updated] (SPARK-20464) Add a job group and an informative description for streaming queries

2017-04-25 Thread Kunal Khamar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khamar updated SPARK-20464: - Summary: Add a job group and an informative description for streaming queries (was: Add a job

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-25 Thread Helena Edelson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983796#comment-15983796 ] Helena Edelson commented on SPARK-18057: I have a branch off branch-2.2 with the 0.10.2.0 upgrade

[jira] [Commented] (SPARK-20445) pyspark.sql.utils.IllegalArgumentException: u'DecisionTreeClassifier was given input with invalid label column label, without the number of classes specified. See Stri

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983892#comment-15983892 ] Hyukjin Kwon commented on SPARK-20445: -- I meant the current codebase, latest build. Probably, I

[jira] [Resolved] (SPARK-20457) Spark CSV is not able to Override Schema while reading data

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-20457. -- Resolution: Duplicate Currently, the nullability seems being ignored. I am pretty sure that it

[jira] [Assigned] (SPARK-20464) Add a job group and an informative description for streaming queries

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20464: Assignee: (was: Apache Spark) > Add a job group and an informative description for

[jira] [Assigned] (SPARK-20464) Add a job group and an informative description for streaming queries

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20464: Assignee: Apache Spark > Add a job group and an informative description for streaming

[jira] [Resolved] (SPARK-20130) Flaky test: BlockManagerProactiveReplicationSuite

2017-04-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-20130. Resolution: Cannot Reproduce Seems a lot more stable now, so closing this until it becomes

[jira] [Assigned] (SPARK-20421) Mark JobProgressListener (and related classes) as deprecated

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20421: Assignee: (was: Apache Spark) > Mark JobProgressListener (and related classes) as

[jira] [Assigned] (SPARK-20421) Mark JobProgressListener (and related classes) as deprecated

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20421: Assignee: Apache Spark > Mark JobProgressListener (and related classes) as deprecated >

[jira] [Commented] (SPARK-20336) spark.read.csv() with wholeFile=True option fails to read non ASCII unicode characters

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983896#comment-15983896 ] Hyukjin Kwon commented on SPARK-20336: -- Thank you guys for confirming this. > spark.read.csv() with

[jira] [Updated] (SPARK-20439) Catalog.listTables() depends on all libraries used to create tables

2017-04-25 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-20439: Fix Version/s: 2.1.1 > Catalog.listTables() depends on all libraries used to create tables >

[jira] [Created] (SPARK-20465) Throws a proper exception rather than ArrayIndexOutOfBoundsException when temp directories could not be got/created

2017-04-25 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-20465: Summary: Throws a proper exception rather than ArrayIndexOutOfBoundsException when temp directories could not be got/created Key: SPARK-20465 URL:

[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter

2017-04-25 Thread 颜发才
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984161#comment-15984161 ] Yan Facai (颜发才) commented on SPARK-20199: - The work is easy, however Public method is added and

[jira] [Resolved] (SPARK-20437) R wrappers for rollup and cube

2017-04-25 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-20437. -- Resolution: Fixed Assignee: Maciej Szymkiewicz Fix Version/s: 2.3.0

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984194#comment-15984194 ] Liang-Chi Hsieh commented on SPARK-20392: - By disabling

[jira] [Updated] (SPARK-20465) Throws a proper exception rather than ArrayIndexOutOfBoundsException when temp directories could not be got/created

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-20465: - Component/s: Spark Core > Throws a proper exception rather than ArrayIndexOutOfBoundsException

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984173#comment-15984173 ] Liang-Chi Hsieh commented on SPARK-20392: - [~barrybecker4] Currently I think the performance

[jira] [Resolved] (SPARK-16548) java.io.CharConversionException: Invalid UTF-32 character prevents me from querying my data

2017-04-25 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-16548. - Resolution: Fixed Fix Version/s: 2.3.0 2.2.0 >

[jira] [Assigned] (SPARK-20465) Throws a proper exception rather than ArrayIndexOutOfBoundsException when temp directories could not be got/created

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20465: Assignee: Apache Spark > Throws a proper exception rather than

[jira] [Commented] (SPARK-20465) Throws a proper exception rather than ArrayIndexOutOfBoundsException when temp directories could not be got/created

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984180#comment-15984180 ] Apache Spark commented on SPARK-20465: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-20465) Throws a proper exception rather than ArrayIndexOutOfBoundsException when temp directories could not be got/created

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20465: Assignee: (was: Apache Spark) > Throws a proper exception rather than

[jira] [Updated] (SPARK-20239) Improve HistoryServer ACL mechanism

2017-04-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-20239: --- Fix Version/s: 2.1.2 2.0.3 > Improve HistoryServer ACL mechanism >

[jira] [Commented] (SPARK-20464) Add a job group and an informative description for streaming queries

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983751#comment-15983751 ] Apache Spark commented on SPARK-20464: -- User 'kunalkhamar' has created a pull request for this

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983809#comment-15983809 ] Shixiong Zhu commented on SPARK-18057: -- I prefer to just wait. The user can still use Kafka 0.10.2.0

[jira] [Commented] (SPARK-20421) Mark JobProgressListener (and related classes) as deprecated

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983808#comment-15983808 ] Apache Spark commented on SPARK-20421: -- User 'vanzin' has created a pull request for this issue:

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-25 Thread Ismael Juma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983853#comment-15983853 ] Ismael Juma commented on SPARK-18057: - It's worth noting that no-one is working on that ticket at the

[jira] [Created] (SPARK-20461) CachedKafkaConsumer may hang forever when it's interrupted

2017-04-25 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-20461: Summary: CachedKafkaConsumer may hang forever when it's interrupted Key: SPARK-20461 URL: https://issues.apache.org/jira/browse/SPARK-20461 Project: Spark

[jira] [Commented] (SPARK-20447) spark mesos scheduler suppress call

2017-04-25 Thread Michael Gummelt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983314#comment-15983314 ] Michael Gummelt commented on SPARK-20447: - The scheduler doesn't support suppression, no, but it

[jira] [Resolved] (SPARK-20449) Upgrade breeze version to 0.13.1

2017-04-25 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-20449. - Resolution: Fixed Fix Version/s: 2.2.0 3.0.0 Issue resolved by pull request

[jira] [Commented] (SPARK-20461) CachedKafkaConsumer may hang forever when it's interrupted

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983387#comment-15983387 ] Apache Spark commented on SPARK-20461: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Resolved] (SPARK-5484) Pregel should checkpoint periodically to avoid StackOverflowError

2017-04-25 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-5484. - Resolution: Fixed Assignee: dingding (was: Ankur Dave) Fix Version/s:

[jira] [Assigned] (SPARK-20461) CachedKafkaConsumer may hang forever when it's interrupted

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20461: Assignee: Apache Spark > CachedKafkaConsumer may hang forever when it's interrupted >

[jira] [Assigned] (SPARK-20461) CachedKafkaConsumer may hang forever when it's interrupted

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20461: Assignee: (was: Apache Spark) > CachedKafkaConsumer may hang forever when it's

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983396#comment-15983396 ] Shixiong Zhu commented on SPARK-18057: -- [~guozhang] We have a stress test to test Spark Kafka

[jira] [Comment Edited] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983396#comment-15983396 ] Shixiong Zhu edited comment on SPARK-18057 at 4/25/17 6:29 PM: --- [~guozhang]

[jira] [Updated] (SPARK-20459) JdbcUtils throws IllegalStateException: Cause already initialized after getting SQLException

2017-04-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20459: Target Version/s: 2.2.0 > JdbcUtils throws IllegalStateException: Cause already initialized after >

[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2017-04-25 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983440#comment-15983440 ] Xiao Li commented on SPARK-20427: - cc [~tsuresh] Are you interested in this? > Issue with Spark

[jira] [Commented] (SPARK-20439) Catalog.listTables() depends on all libraries used to create tables

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983224#comment-15983224 ] Apache Spark commented on SPARK-20439: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Commented] (SPARK-9103) Tracking spark's memory usage

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983531#comment-15983531 ] Apache Spark commented on SPARK-9103: - User 'jsoltren' has created a pull request for this issue:

[jira] [Commented] (SPARK-20456) Document major aggregation functions for pyspark

2017-04-25 Thread Michael Patterson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983590#comment-15983590 ] Michael Patterson commented on SPARK-20456: --- I saw that there are short docstrings for the

[jira] [Created] (SPARK-20462) Spark-Kinesis Direct Connector

2017-04-25 Thread Lauren Moos (JIRA)
Lauren Moos created SPARK-20462: --- Summary: Spark-Kinesis Direct Connector Key: SPARK-20462 URL: https://issues.apache.org/jira/browse/SPARK-20462 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-04-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-13747: - Fix Version/s: (was: 2.2.0) > Concurrent execution in SQL doesn't work with Scala

[jira] [Reopened] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-04-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-13747: -- > Concurrent execution in SQL doesn't work with Scala ForkJoinPool >

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983612#comment-15983612 ] Apache Spark commented on SPARK-13747: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Assigned] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13747: Assignee: Apache Spark (was: Shixiong Zhu) > Concurrent execution in SQL doesn't work

[jira] [Assigned] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13747: Assignee: Shixiong Zhu (was: Apache Spark) > Concurrent execution in SQL doesn't work

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-04-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983615#comment-15983615 ] Shixiong Zhu commented on SPARK-13747: -- [~dnaumenko] Unfortunately, Spark uses ThreadLocal variables

[jira] [Commented] (SPARK-20392) Slow performance when calling fit on ML pipeline for dataset with many columns but few rows

2017-04-25 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983490#comment-15983490 ] Kazuaki Ishizaki commented on SPARK-20392: -- Here are my observations: According to

[jira] [Created] (SPARK-20463) Expose SPARK SQL <=> operator in PySpark

2017-04-25 Thread Michael Styles (JIRA)
Michael Styles created SPARK-20463: -- Summary: Expose SPARK SQL <=> operator in PySpark Key: SPARK-20463 URL: https://issues.apache.org/jira/browse/SPARK-20463 Project: Spark Issue Type:

[jira] [Commented] (SPARK-20463) Expose SPARK SQL <=> operator in PySpark

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983654#comment-15983654 ] Apache Spark commented on SPARK-20463: -- User 'ptkool' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20463) Expose SPARK SQL <=> operator in PySpark

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20463: Assignee: Apache Spark > Expose SPARK SQL <=> operator in PySpark >

[jira] [Assigned] (SPARK-20463) Expose SPARK SQL <=> operator in PySpark

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20463: Assignee: (was: Apache Spark) > Expose SPARK SQL <=> operator in PySpark >

[jira] [Assigned] (SPARK-20463) Expose SPARK SQL <=> operator in PySpark

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20463: Assignee: (was: Apache Spark) > Expose SPARK SQL <=> operator in PySpark >

[jira] [Assigned] (SPARK-20463) Expose SPARK SQL <=> operator in PySpark

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20463: Assignee: Apache Spark > Expose SPARK SQL <=> operator in PySpark >

[jira] [Created] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-25 Thread Armin Braun (JIRA)
Armin Braun created SPARK-20455: --- Summary: Missing Test Target in Documentation for "Running Docker-based Integration Test Suites" Key: SPARK-20455 URL: https://issues.apache.org/jira/browse/SPARK-20455

[jira] [Commented] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982420#comment-15982420 ] Apache Spark commented on SPARK-20455: -- User 'original-brownbear' has created a pull request for

[jira] [Assigned] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20455: Assignee: (was: Apache Spark) > Missing Test Target in Documentation for "Running

[jira] [Assigned] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20455: Assignee: Apache Spark > Missing Test Target in Documentation for "Running Docker-based

[jira] [Created] (SPARK-20456) Document major aggregation functions for pyspark

2017-04-25 Thread Michael Patterson (JIRA)
Michael Patterson created SPARK-20456: - Summary: Document major aggregation functions for pyspark Key: SPARK-20456 URL: https://issues.apache.org/jira/browse/SPARK-20456 Project: Spark

[jira] [Resolved] (SPARK-20445) pyspark.sql.utils.IllegalArgumentException: u'DecisionTreeClassifier was given input with invalid label column label, without the number of classes specified. See Strin

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-20445. -- Resolution: Cannot Reproduce I am resolving this as I can't reproduce this in the current

[jira] [Commented] (SPARK-20369) pyspark: Dynamic configuration with SparkConf does not work

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982489#comment-15982489 ] Hyukjin Kwon commented on SPARK-20369: -- It looks I can't reproduce this as below: {code} from

[jira] [Comment Edited] (SPARK-20369) pyspark: Dynamic configuration with SparkConf does not work

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982489#comment-15982489 ] Hyukjin Kwon edited comment on SPARK-20369 at 4/25/17 7:24 AM: --- It looks I

[jira] [Updated] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-20455: -- Priority: Trivial (was: Minor) > Missing Test Target in Documentation for "Running Docker-based

[jira] [Commented] (SPARK-20456) Document major aggregation functions for pyspark

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982507#comment-15982507 ] Hyukjin Kwon commented on SPARK-20456: -- > Document `sql.functions.py`: 1. Document the common

[jira] [Comment Edited] (SPARK-20456) Document major aggregation functions for pyspark

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982507#comment-15982507 ] Hyukjin Kwon edited comment on SPARK-20456 at 4/25/17 7:37 AM: --- {quote}

[jira] [Updated] (SPARK-20456) Document major aggregation functions for pyspark

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-20456: - Priority: Minor (was: Major) > Document major aggregation functions for pyspark >

[jira] [Assigned] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-20455: - Assignee: Armin Braun > Missing Test Target in Documentation for "Running Docker-based

[jira] [Resolved] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20455. --- Resolution: Fixed Fix Version/s: 2.1.2 2.2.0 Issue resolved by pull

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2017-04-25 Thread sskadarkar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982550#comment-15982550 ] sskadarkar commented on SPARK-18492: [~tdas] I am also getting the same error which Rupinder has

[jira] [Assigned] (SPARK-20404) Regression with accumulator names when migrating from 1.6 to 2.x

2017-04-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-20404: - Assignee: Sergey Zhemzhitsky Priority: Minor (was: Major) Issue Type: Improvement

[jira] [Resolved] (SPARK-20404) Regression with accumulator names when migrating from 1.6 to 2.x

2017-04-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20404. --- Resolution: Fixed Fix Version/s: 2.1.2 2.2.0 Issue resolved by pull

[jira] [Commented] (SPARK-7481) Add spark-hadoop-cloud module to pull in object store support

2017-04-25 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982587#comment-15982587 ] Steven Rand commented on SPARK-7481: What happened to https://github.com/apache/spark/pull/12004? It

[jira] [Commented] (SPARK-7481) Add spark-hadoop-cloud module to pull in object store support

2017-04-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982591#comment-15982591 ] Sean Owen commented on SPARK-7481: -- I don't believe my last round of comments were addressed, and it was

[jira] [Commented] (SPARK-17403) Fatal Error: Scan cached strings

2017-04-25 Thread Paul Lysak (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982595#comment-15982595 ] Paul Lysak commented on SPARK-17403: Looks like we have the same issue with Spark 2.1 on YARN (Amazon

[jira] [Commented] (SPARK-20336) spark.read.csv() with wholeFile=True option fails to read non ASCII unicode characters

2017-04-25 Thread Armin Braun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982669#comment-15982669 ] Armin Braun commented on SPARK-20336: - [~priancho] my bad apparently in the above. I can't retrace

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-25 Thread Ismael Juma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982683#comment-15982683 ] Ismael Juma commented on SPARK-18057: - Thanks for the clarification [~zsxwing], that's helpful. >

[jira] [Commented] (SPARK-13857) Feature parity for ALS ML with MLLIB

2017-04-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982695#comment-15982695 ] Nick Pentreath commented on SPARK-13857: I'm going to close this as superseded by SPARK-19535.

[jira] [Closed] (SPARK-13857) Feature parity for ALS ML with MLLIB

2017-04-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-13857. -- Resolution: Duplicate > Feature parity for ALS ML with MLLIB >

[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly

2017-04-25 Thread kanika dhuria (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanika dhuria updated SPARK-17922: -- Attachment: spark_17922.tar.gz Repro case > ClassCastException java.lang.ClassCastException:

[jira] [Commented] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cata

2017-04-25 Thread kanika dhuria (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982700#comment-15982700 ] kanika dhuria commented on SPARK-17922: --- Hi , I have attached the repro case for this issue. The

[jira] [Commented] (SPARK-20336) spark.read.csv() with wholeFile=True option fails to read non ASCII unicode characters

2017-04-25 Thread HanCheol Cho (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982702#comment-15982702 ] HanCheol Cho commented on SPARK-20336: -- Thank you for your additiona test [~original-brownbear]]. I

[jira] [Closed] (SPARK-20336) spark.read.csv() with wholeFile=True option fails to read non ASCII unicode characters

2017-04-25 Thread HanCheol Cho (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HanCheol Cho closed SPARK-20336. Resolution: Not A Bug > spark.read.csv() with wholeFile=True option fails to read non ASCII

[jira] [Created] (SPARK-20457) Spark CSV is not able to Override Schema while reading data

2017-04-25 Thread Himanshu Gupta (JIRA)
Himanshu Gupta created SPARK-20457: -- Summary: Spark CSV is not able to Override Schema while reading data Key: SPARK-20457 URL: https://issues.apache.org/jira/browse/SPARK-20457 Project: Spark

[jira] [Updated] (SPARK-20457) Spark CSV is not able to Override Schema while reading data

2017-04-25 Thread Himanshu Gupta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Gupta updated SPARK-20457: --- Description: I have a CSV file, test.csv: {code:xml} col 1 2 3 4 {code} When I read it

[jira] [Commented] (SPARK-20445) pyspark.sql.utils.IllegalArgumentException: u'DecisionTreeClassifier was given input with invalid label column label, without the number of classes specified. See Stri

2017-04-25 Thread surya pratap (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982830#comment-15982830 ] surya pratap commented on SPARK-20445: -- Hello Hyukjin Kwon Thxz for reply. I tried many times but

[jira] [Commented] (SPARK-20445) pyspark.sql.utils.IllegalArgumentException: u'DecisionTreeClassifier was given input with invalid label column label, without the number of classes specified. See Stri

2017-04-25 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982852#comment-15982852 ] Hyukjin Kwon commented on SPARK-20445: -- Are you maybe able to try this against the current master or

[jira] [Created] (SPARK-20458) support getting Yarn Tracking URL in code

2017-04-25 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-20458: -- Summary: support getting Yarn Tracking URL in code Key: SPARK-20458 URL: https://issues.apache.org/jira/browse/SPARK-20458 Project: Spark Issue Type:

[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

2017-04-25 Thread Dmitry Naumenko (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982949#comment-15982949 ] Dmitry Naumenko commented on SPARK-13747: - [~zsxwing] I did a similar test with join and have the

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983018#comment-15983018 ] Nick Pentreath commented on SPARK-20446: By the way when I say it is a duplicate I mean for the

[jira] [Commented] (SPARK-11968) ALS recommend all methods spend most of time in GC

2017-04-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983026#comment-15983026 ] Nick Pentreath commented on SPARK-11968: Note, there is a solution proposed in SPARK-20446. I've

  1   2   >