[jira] [Commented] (SPARK-18508) Fix documentation for DateDiff

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678731#comment-15678731 ] Apache Spark commented on SPARK-18508: -- User 'rxin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18508) Fix documentation for DateDiff

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18508: Assignee: Reynold Xin (was: Apache Spark) > Fix documentation for DateDiff >

[jira] [Assigned] (SPARK-18508) Fix documentation for DateDiff

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18508: Assignee: Apache Spark (was: Reynold Xin) > Fix documentation for DateDiff >

[jira] [Created] (SPARK-18508) Fix documentation for DateDiff

2016-11-18 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-18508: --- Summary: Fix documentation for DateDiff Key: SPARK-18508 URL: https://issues.apache.org/jira/browse/SPARK-18508 Project: Spark Issue Type: Bug

[jira] [Closed] (SPARK-18089) Remove CollectLimitExec operator

2016-11-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-18089. --- Resolution: Won't Fix > Remove CollectLimitExec operator >

[jira] [Commented] (SPARK-18319) ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678230#comment-15678230 ] Joseph K. Bradley commented on SPARK-18319: --- I'd prefer not to open up Vector and Matrix.

[jira] [Commented] (SPARK-18319) ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678231#comment-15678231 ] Joseph K. Bradley commented on SPARK-18319: --- Thanks [~yuhaoyan] for the audit! > ML, Graph 2.1

[jira] [Resolved] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-18 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-18497. --- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15934

[jira] [Resolved] (SPARK-18505) Simplify AnalyzeColumnCommand

2016-11-18 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-18505. - Resolution: Fixed Fix Version/s: 2.1.0 > Simplify AnalyzeColumnCommand >

[jira] [Commented] (SPARK-18319) ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678218#comment-15678218 ] Joseph K. Bradley commented on SPARK-18319: --- I agree with the "probably ready to be unmarked"

[jira] [Comment Edited] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables

2016-11-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678215#comment-15678215 ] Michael Allman edited comment on SPARK-18507 at 11/19/16 12:30 AM: --- CC

[jira] [Commented] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables

2016-11-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678215#comment-15678215 ] Michael Allman commented on SPARK-18507: CC [~ekhliang] > Major performance regression in SHOW

[jira] [Created] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables

2016-11-18 Thread Michael Allman (JIRA)
Michael Allman created SPARK-18507: -- Summary: Major performance regression in SHOW PARTITIONS on partitioned Hive tables Key: SPARK-18507 URL: https://issues.apache.org/jira/browse/SPARK-18507

[jira] [Updated] (SPARK-18506) kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-18 Thread Heji Kim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heji Kim updated SPARK-18506: - Description: Our team is trying to upgrade to Spark 2.0.2/Kafka

[jira] [Commented] (SPARK-18356) Issue + Resolution: Kmeans Spark Performances (ML package)

2016-11-18 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678196#comment-15678196 ] yuhao yang commented on SPARK-18356: Surely I would not not mind. You're more than welcome to send a

[jira] [Resolved] (SPARK-18477) Enable interrupts for HDFS in HDFSMetadataLog

2016-11-18 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-18477. --- Resolution: Fixed Fix Version/s: 2.1.0 2.0.3 Issue resolved by

[jira] [Updated] (SPARK-18477) Enable interrupts for HDFS in HDFSMetadataLog

2016-11-18 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-18477: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-8360 > Enable interrupts for

[jira] [Created] (SPARK-18506) kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-18 Thread Heji Kim (JIRA)
Heji Kim created SPARK-18506: Summary: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic Key: SPARK-18506 URL:

[jira] [Closed] (SPARK-11613) Kinesis ASL should allow caller to set ClientConfiguration for socket timeouts and other connection setting

2016-11-18 Thread Heji Kim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heji Kim closed SPARK-11613. Resolution: Fixed > Kinesis ASL should allow caller to set ClientConfiguration for socket > timeouts and

[jira] [Commented] (SPARK-18504) Scalar subquery with extra group by columns returning incorrect result

2016-11-18 Thread Nattavut Sutyanyong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678001#comment-15678001 ] Nattavut Sutyanyong commented on SPARK-18504: - While we have SPARK-18455 to track the

[jira] [Assigned] (SPARK-18504) Scalar subquery with extra group by columns returning incorrect result

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18504: Assignee: Apache Spark > Scalar subquery with extra group by columns returning incorrect

[jira] [Assigned] (SPARK-18504) Scalar subquery with extra group by columns returning incorrect result

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18504: Assignee: (was: Apache Spark) > Scalar subquery with extra group by columns returning

[jira] [Commented] (SPARK-18504) Scalar subquery with extra group by columns returning incorrect result

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677991#comment-15677991 ] Apache Spark commented on SPARK-18504: -- User 'nsyca' has created a pull request for this issue:

[jira] [Commented] (SPARK-18188) Add checksum for block of broadcast

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677983#comment-15677983 ] Apache Spark commented on SPARK-18188: -- User 'davies' has created a pull request for this issue:

[jira] [Comment Edited] (SPARK-2984) FileNotFoundException on _temporary directory

2016-11-18 Thread Giuseppe Bonaccorso (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677971#comment-15677971 ] Giuseppe Bonaccorso edited comment on SPARK-2984 at 11/18/16 10:35 PM:

[jira] [Comment Edited] (SPARK-2984) FileNotFoundException on _temporary directory

2016-11-18 Thread Giuseppe Bonaccorso (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677971#comment-15677971 ] Giuseppe Bonaccorso edited comment on SPARK-2984 at 11/18/16 10:35 PM:

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2016-11-18 Thread Giuseppe Bonaccorso (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677971#comment-15677971 ] Giuseppe Bonaccorso commented on SPARK-2984: I'm facing the same issue with EMR 5.0.1 with

[jira] [Updated] (SPARK-5992) Locality Sensitive Hashing (LSH)

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-5992: - Summary: Locality Sensitive Hashing (LSH) (was: Locality Sensitive Hashing (LSH) for

[jira] [Updated] (SPARK-18188) Add checksum for block of broadcast

2016-11-18 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-18188: --- Description: There is an understanding issue for a long time:

[jira] [Updated] (SPARK-18188) Add checksum for block of broadcast

2016-11-18 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-18188: --- Summary: Add checksum for block of broadcast (was: Add checksum for block in Spark) > Add checksum

[jira] [Closed] (SPARK-18000) Aggregation function for computing bins (distinct value, count) pairs for equi-width histograms

2016-11-18 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-18000. --- Resolution: Won't Fix Marking this as won't fix, since it looks like combination of count-min sketch

[jira] [Updated] (SPARK-16561) Potential numerical problem in MultivariateOnlineSummarizer min/max

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-16561: -- Summary: Potential numerical problem in MultivariateOnlineSummarizer min/max (was:

[jira] [Updated] (SPARK-16831) PySpark CrossValidator reports incorrect avgMetrics

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-16831: -- Summary: PySpark CrossValidator reports incorrect avgMetrics (was: CrossValidator

[jira] [Assigned] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18497: Assignee: Shixiong Zhu (was: Apache Spark) > ForeachSink fails with "assertion failed:

[jira] [Commented] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677932#comment-15677932 ] Apache Spark commented on SPARK-18497: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18497: Assignee: Apache Spark (was: Shixiong Zhu) > ForeachSink fails with "assertion failed:

[jira] [Updated] (SPARK-18505) Simplify AnalyzeColumnCommand

2016-11-18 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-18505: Description: I'm spending more time at the design & code level for cost-based optimizer now, and

[jira] [Commented] (SPARK-18505) Simplify AnalyzeColumnCommand

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677895#comment-15677895 ] Apache Spark commented on SPARK-18505: -- User 'rxin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18505) Simplify AnalyzeColumnCommand

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18505: Assignee: Reynold Xin (was: Apache Spark) > Simplify AnalyzeColumnCommand >

[jira] [Assigned] (SPARK-18505) Simplify AnalyzeColumnCommand

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18505: Assignee: Apache Spark (was: Reynold Xin) > Simplify AnalyzeColumnCommand >

[jira] [Assigned] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-18497: Assignee: Shixiong Zhu > ForeachSink fails with "assertion failed: No plan for

[jira] [Created] (SPARK-18505) Simplify AnalyzeColumnCommand

2016-11-18 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-18505: --- Summary: Simplify AnalyzeColumnCommand Key: SPARK-18505 URL: https://issues.apache.org/jira/browse/SPARK-18505 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-18422) Fix wholeTextFiles test to pass on Windows in JavaAPISuite

2016-11-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18422: -- Assignee: Hyukjin Kwon > Fix wholeTextFiles test to pass on Windows in JavaAPISuite >

[jira] [Commented] (SPARK-18504) Scalar subquery with extra group by columns returning incorrect result

2016-11-18 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677849#comment-15677849 ] Herman van Hovell commented on SPARK-18504: --- Could you open a PR? > Scalar subquery with extra

[jira] [Resolved] (SPARK-18422) Fix wholeTextFiles test to pass on Windows in JavaAPISuite

2016-11-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18422. --- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15866

[jira] [Commented] (SPARK-18504) Scalar subquery with extra group by columns returning incorrect result

2016-11-18 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677848#comment-15677848 ] Herman van Hovell commented on SPARK-18504: --- Is this a valid correlated scalar subquery? They

[jira] [Updated] (SPARK-17363) fix MultivariateOnlineSummerizer.numNonZeros

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-17363: -- Summary: fix MultivariateOnlineSummerizer.numNonZeros (was: fix

[jira] [Updated] (SPARK-18504) Scalar subquery with extra group by columns returning incorrect result

2016-11-18 Thread Nattavut Sutyanyong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nattavut Sutyanyong updated SPARK-18504: Summary: Scalar subquery with extra group by columns returning incorrect result

[jira] [Commented] (SPARK-18504) Scalar subquery with extra group by columns returning incorrect result

2016-11-18 Thread Nattavut Sutyanyong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677765#comment-15677765 ] Nattavut Sutyanyong commented on SPARK-18504: - // Incorrect result

[jira] [Created] (SPARK-18504) Scalar subquery returning incorrect result

2016-11-18 Thread Nattavut Sutyanyong (JIRA)
Nattavut Sutyanyong created SPARK-18504: --- Summary: Scalar subquery returning incorrect result Key: SPARK-18504 URL: https://issues.apache.org/jira/browse/SPARK-18504 Project: Spark

[jira] [Created] (SPARK-18503) Pre 2.0 spark driver/executor memory default unit is bytes, post 2.0 default unit is MB

2016-11-18 Thread Chris McCubbin (JIRA)
Chris McCubbin created SPARK-18503: -- Summary: Pre 2.0 spark driver/executor memory default unit is bytes, post 2.0 default unit is MB Key: SPARK-18503 URL: https://issues.apache.org/jira/browse/SPARK-18503

[jira] [Updated] (SPARK-18334) What hashDistance should MinHash use?

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-18334: -- Target Version/s: (was: 2.1.0) > What hashDistance should MinHash use? >

[jira] [Updated] (SPARK-18334) What hashDistance should MinHash use?

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-18334: -- Priority: Minor (was: Trivial) > What hashDistance should MinHash use? >

[jira] [Commented] (SPARK-18334) MinHash should use binary hash distance

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677667#comment-15677667 ] Joseph K. Bradley commented on SPARK-18334: --- Adding a note: Per discussions on PRs, we need to

[jira] [Updated] (SPARK-18334) What hashDistance should MinHash use?

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-18334: -- Summary: What hashDistance should MinHash use? (was: MinHash should use binary hash

[jira] [Updated] (SPARK-18339) Don't push down current_timestamp for filters in StructuredStreaming

2016-11-18 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18339: - Assignee: Tyson Condie > Don't push down current_timestamp for filters in

[jira] [Updated] (SPARK-18339) Don't push down current_timestamp for filters in StructuredStreaming

2016-11-18 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18339: - Target Version/s: 2.1.0 (was: 2.2.0) > Don't push down current_timestamp for filters in

[jira] [Closed] (SPARK-18252) Improve serialized BloomFilter size

2016-11-18 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-18252. --- Resolution: Won't Fix > Improve serialized BloomFilter size > --- >

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-18 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677521#comment-15677521 ] Reynold Xin commented on SPARK-18252: - Thanks - going to close this. > Improve serialized

[jira] [Resolved] (SPARK-18457) ORC and other columnar formats using HiveShim read all columns when doing a simple count

2016-11-18 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-18457. - Resolution: Fixed Assignee: Andrew Ray Fix Version/s: 2.1.0 > ORC and other

[jira] [Resolved] (SPARK-18187) CompactibleFileStreamLog should not rely on "compactInterval" to detect a compaction batch

2016-11-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-18187. -- Resolution: Fixed Assignee: Tyson Condie Fix Version/s: 2.1.0 >

[jira] [Commented] (SPARK-18187) CompactibleFileStreamLog should not rely on "compactInterval" to detect a compaction batch

2016-11-18 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677492#comment-15677492 ] Shixiong Zhu commented on SPARK-18187: -- Resolved by https://github.com/apache/spark/pull/15852 >

[jira] [Closed] (SPARK-11785) When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception

2016-11-18 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell closed SPARK-11785. - Resolution: Fixed Fix Version/s: 2.1.0 > When deployed against remote Hive

[jira] [Commented] (SPARK-11785) When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677469#comment-15677469 ] Cheng Lian commented on SPARK-11785: But I'm not sure which PR fixes this issue, though. > When

[jira] [Updated] (SPARK-10643) Support remote application download in client mode spark submit

2016-11-18 Thread Michael Gummelt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-10643: Summary: Support remote application download in client mode spark submit (was: Support

[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit

2016-11-18 Thread Michael Gummelt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677467#comment-15677467 ] Michael Gummelt commented on SPARK-10643: - It's not just HDFS. HTTP urls fail as well: {code}

[jira] [Commented] (SPARK-11785) When deployed against remote Hive metastore with lower versions, JDBC metadata calls throws exception

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677468#comment-15677468 ] Cheng Lian commented on SPARK-11785: Confirmed that this is no longer an issue for 2.1 > When

[jira] [Updated] (SPARK-18321) ML 2.1 QA: API: Java compatibility, docs

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-18321: -- Assignee: Seth Hendrickson > ML 2.1 QA: API: Java compatibility, docs >

[jira] [Resolved] (SPARK-18321) ML 2.1 QA: API: Java compatibility, docs

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-18321. --- Resolution: Fixed Fix Version/s: 2.1.0 > ML 2.1 QA: API: Java compatibility,

[jira] [Commented] (SPARK-18321) ML 2.1 QA: API: Java compatibility, docs

2016-11-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677435#comment-15677435 ] Joseph K. Bradley commented on SPARK-18321: --- I checked the diff between docs as well and did

[jira] [Comment Edited] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677396#comment-15677396 ] Cheng Lian edited comment on SPARK-18251 at 11/18/16 6:38 PM: -- I'd prefer

[jira] [Comment Edited] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677396#comment-15677396 ] Cheng Lian edited comment on SPARK-18251 at 11/18/16 6:37 PM: -- I'd prefer

[jira] [Commented] (SPARK-18251) DataSet API | RuntimeException: Null value appeared in non-nullable field when holding Option Case Class

2016-11-18 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677396#comment-15677396 ] Cheng Lian commented on SPARK-18251: I'd prefer option 1 because of consistency of the semantics, and

[jira] [Commented] (SPARK-18218) Optimize BlockMatrix multiplication, which may cause OOM and low parallelism usage problem in several cases

2016-11-18 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677367#comment-15677367 ] Burak Yavuz commented on SPARK-18218: - [~WeichenXu123] You are correct, this would be a problem. But

[jira] [Commented] (SPARK-18134) SQL: MapType in Group BY and Joins not working

2016-11-18 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677266#comment-15677266 ] Herman van Hovell commented on SPARK-18134: --- There is not a political reason for doing this.

[jira] [Commented] (SPARK-18134) SQL: MapType in Group BY and Joins not working

2016-11-18 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677252#comment-15677252 ] Herman van Hovell commented on SPARK-18134: --- In both cases you could use sorted arrays of

[jira] [Comment Edited] (SPARK-13913) DataFrame.withColumn fails when trying to replace existing column with dot in name

2016-11-18 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677144#comment-15677144 ] Barry Becker edited comment on SPARK-13913 at 11/18/16 5:02 PM: I can

[jira] [Commented] (SPARK-13913) DataFrame.withColumn fails when trying to replace existing column with dot in name

2016-11-18 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677144#comment-15677144 ] Barry Becker commented on SPARK-13913: -- I can still reproduce this using spark 1.6.3. My dataframe

[jira] [Commented] (SPARK-18249) StackOverflowError when saving dataset to parquet

2016-11-18 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677115#comment-15677115 ] Herman van Hovell commented on SPARK-18249: --- The good news is that this is not a parquet issue.

[jira] [Commented] (SPARK-18202) Spark throws a mysterious system error when a Hive command has at least 100,000 results

2016-11-18 Thread Martin Petricek (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677072#comment-15677072 ] Martin Petricek commented on SPARK-18202: - I encountered the same problem in 1.6.1 It seems that

[jira] [Commented] (SPARK-18134) SQL: MapType in Group BY and Joins not working

2016-11-18 Thread Christian Zorneck (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677059#comment-15677059 ] Christian Zorneck commented on SPARK-18134: --- Because there is no valid workaround, which fits

[jira] [Commented] (SPARK-14155) Hide UserDefinedType in Spark 2.0

2016-11-18 Thread Raghu Ganti (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677052#comment-15677052 ] Raghu Ganti commented on SPARK-14155: - Is there an update re this? Any timeline as to when UDTs will

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-18 Thread Gregory SSI-YAN-KAI (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676986#comment-15676986 ] Gregory SSI-YAN-KAI commented on SPARK-18252: - I've worked on a custom implementation of

[jira] [Commented] (SPARK-18496) java.lang.AssertionError: assertion failed

2016-11-18 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676956#comment-15676956 ] Dongjoon Hyun commented on SPARK-18496: --- Hi, Harish. Do you mean the officially Apache Spark 2.0.2

[jira] [Commented] (SPARK-10872) Derby error (XSDB6) when creating new HiveContext after restarting SparkContext

2016-11-18 Thread Michal W (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676944#comment-15676944 ] Michal W commented on SPARK-10872: -- I'm having the same issue on 1.6.1. It's quite inconvenient when

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-18 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676908#comment-15676908 ] Aleksey Ponkin commented on SPARK-18252: The only thing that can be improved, IMHO - using right

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-18 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676899#comment-15676899 ] Aleksey Ponkin commented on SPARK-18252: I did benchmarks(you can find it

[jira] [Assigned] (SPARK-18448) SparkSession should implement java.lang.AutoCloseable like JavaSparkContext

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18448: Assignee: Apache Spark > SparkSession should implement java.lang.AutoCloseable like

[jira] [Commented] (SPARK-18448) SparkSession should implement java.lang.AutoCloseable like JavaSparkContext

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676878#comment-15676878 ] Apache Spark commented on SPARK-18448: -- User 'srowen' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18448) SparkSession should implement java.lang.AutoCloseable like JavaSparkContext

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18448: Assignee: (was: Apache Spark) > SparkSession should implement java.lang.AutoCloseable

[jira] [Created] (SPARK-18502) Spark does not handle columns that contain backquote (`)

2016-11-18 Thread Barry Becker (JIRA)
Barry Becker created SPARK-18502: Summary: Spark does not handle columns that contain backquote (`) Key: SPARK-18502 URL: https://issues.apache.org/jira/browse/SPARK-18502 Project: Spark

[jira] [Commented] (SPARK-11977) Support accessing a DataFrame column using its name without backticks if the name contains '.'

2016-11-18 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676856#comment-15676856 ] Barry Becker commented on SPARK-11977: -- I would also like to know how to handle columns that contain

[jira] [Comment Edited] (SPARK-18484) case class datasets - ability to specify decimal precision and scale

2016-11-18 Thread Damian Momot (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673347#comment-15673347 ] Damian Momot edited comment on SPARK-18484 at 11/18/16 1:51 PM: Only

[jira] [Commented] (SPARK-18249) StackOverflowError when saving dataset to parquet

2016-11-18 Thread Damian Momot (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676734#comment-15676734 ] Damian Momot commented on SPARK-18249: -- Just check that it also happens on 2.0.2 and 2.1.0 (latest

[jira] [Updated] (SPARK-18249) StackOverflowError when saving dataset to parquet

2016-11-18 Thread Damian Momot (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damian Momot updated SPARK-18249: - Affects Version/s: 2.1.0 Summary: StackOverflowError when saving dataset to parquet

[jira] [Updated] (SPARK-18249) Spark 2.0.1 - StackOverflowError when saving dataset to parquet

2016-11-18 Thread Damian Momot (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damian Momot updated SPARK-18249: - Affects Version/s: 2.0.2 > Spark 2.0.1 - StackOverflowError when saving dataset to parquet >

[jira] [Resolved] (SPARK-18393) DataFrame pivot output column names should respect aliases

2016-11-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18393. --- Resolution: Duplicate > DataFrame pivot output column names should respect aliases >

[jira] [Commented] (SPARK-18471) In treeAggregate, generate (big) zeros instead of sending them.

2016-11-18 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676386#comment-15676386 ] Apache Spark commented on SPARK-18471: -- User 'AnthonyTruchet' has created a pull request for this

[jira] [Resolved] (SPARK-12278) Move the shuffle related test case from Yarn module to Core module

2016-11-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12278. --- Resolution: Not A Problem I don't see a suite by this name any more anyway. It sounds like this

[jira] [Commented] (SPARK-18356) Issue + Resolution: Kmeans Spark Performances (ML package)

2016-11-18 Thread zakaria hili (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676337#comment-15676337 ] zakaria hili commented on SPARK-18356: -- if you don't mind , yes > Issue + Resolution: Kmeans Spark

  1   2   >