[jira] [Created] (SPARK-39897) StackOverflowError in TaskMemoryManager

2022-07-27 Thread Andrew Ray (Jira)
Andrew Ray created SPARK-39897: -- Summary: StackOverflowError in TaskMemoryManager Key: SPARK-39897 URL: https://issues.apache.org/jira/browse/SPARK-39897 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-39883) Add DataFrame function parity check

2022-07-26 Thread Andrew Ray (Jira)
Andrew Ray created SPARK-39883: -- Summary: Add DataFrame function parity check Key: SPARK-39883 URL: https://issues.apache.org/jira/browse/SPARK-39883 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-39734) Add call_udf to pyspark.sql.functions

2022-07-10 Thread Andrew Ray (Jira)
Andrew Ray created SPARK-39734: -- Summary: Add call_udf to pyspark.sql.functions Key: SPARK-39734 URL: https://issues.apache.org/jira/browse/SPARK-39734 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-39733) Add map_contains_key to pyspark.sql.functions

2022-07-10 Thread Andrew Ray (Jira)
Andrew Ray created SPARK-39733: -- Summary: Add map_contains_key to pyspark.sql.functions Key: SPARK-39733 URL: https://issues.apache.org/jira/browse/SPARK-39733 Project: Spark Issue Type:

[jira] [Updated] (SPARK-39728) Test for parity of SQL functions between Python and JVM DataFrame API's

2022-07-09 Thread Andrew Ray (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ray updated SPARK-39728: --- Priority: Minor (was: Major) > Test for parity of SQL functions between Python and JVM DataFrame

[jira] [Created] (SPARK-39728) Test for parity of SQL functions between Python and JVM DataFrame API's

2022-07-09 Thread Andrew Ray (Jira)
Andrew Ray created SPARK-39728: -- Summary: Test for parity of SQL functions between Python and JVM DataFrame API's Key: SPARK-39728 URL: https://issues.apache.org/jira/browse/SPARK-39728 Project: Spark

[jira] [Created] (SPARK-21628) Explicitly specify Java version in maven compiler plugin so IntelliJ imports project correctly

2017-08-03 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-21628: -- Summary: Explicitly specify Java version in maven compiler plugin so IntelliJ imports project correctly Key: SPARK-21628 URL: https://issues.apache.org/jira/browse/SPARK-21628

[jira] [Commented] (SPARK-21034) Filter not getting pushed down the groupBy clause when first() or last() aggregate function is used

2017-08-02 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111454#comment-16111454 ] Andrew Ray commented on SPARK-21034: Yes a=1 is the filter to be pushed down. It is not pushed

[jira] [Commented] (SPARK-21034) Filter not getting pushed down the groupBy clause when first() or last() aggregate function is used

2017-08-02 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1631#comment-1631 ] Andrew Ray commented on SPARK-21034: {{first}} is not a deterministic function and thus filters are

[jira] [Commented] (SPARK-21110) Structs should be usable in inequality filters

2017-08-02 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111093#comment-16111093 ] Andrew Ray commented on SPARK-21110: https://github.com/apache/spark/pull/18818 > Structs should be

[jira] [Commented] (SPARK-21110) Structs should be usable in inequality filters

2017-08-01 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109734#comment-16109734 ] Andrew Ray commented on SPARK-21110: I'm working on this > Structs should be usable in inequality

[jira] [Commented] (SPARK-21330) Bad partitioning does not allow to read a JDBC table with extreme values on the partition column

2017-08-01 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109568#comment-16109568 ] Andrew Ray commented on SPARK-21330: https://github.com/apache/spark/pull/18800 > Bad partitioning

[jira] [Commented] (SPARK-21565) aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp

2017-07-31 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108006#comment-16108006 ] Andrew Ray commented on SPARK-21565: No nothing like the limitations of microbatches. The window can

[jira] [Commented] (SPARK-21565) aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp

2017-07-31 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107933#comment-16107933 ] Andrew Ray commented on SPARK-21565: I believe you need to use a window to group by your event time.

[jira] [Updated] (SPARK-21584) Update R method for summary to call new implementation

2017-07-31 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ray updated SPARK-21584: --- Component/s: SQL > Update R method for summary to call new implementation >

[jira] [Created] (SPARK-21584) Update R method for summary to call new implementation

2017-07-31 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-21584: -- Summary: Update R method for summary to call new implementation Key: SPARK-21584 URL: https://issues.apache.org/jira/browse/SPARK-21584 Project: Spark Issue

[jira] [Created] (SPARK-21566) Python method for summary

2017-07-28 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-21566: -- Summary: Python method for summary Key: SPARK-21566 URL: https://issues.apache.org/jira/browse/SPARK-21566 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-21100) Add summary method as alternative to describe that gives quartiles similar to Pandas

2017-07-05 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ray updated SPARK-21100: --- Summary: Add summary method as alternative to describe that gives quartiles similar to Pandas (was:

[jira] [Commented] (SPARK-21184) QuantileSummaries implementation is wrong and QuantileSummariesSuite fails with larger n

2017-06-28 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067167#comment-16067167 ] Andrew Ray commented on SPARK-21184: Also the lookup queries are just wrong {code} scala> Seq(1,

[jira] [Created] (SPARK-21184) QuantileSummaries implementation is wrong and QuantileSummariesSuite fails with larger n

2017-06-22 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-21184: -- Summary: QuantileSummaries implementation is wrong and QuantileSummariesSuite fails with larger n Key: SPARK-21184 URL: https://issues.apache.org/jira/browse/SPARK-21184

[jira] [Created] (SPARK-21100) describe should give quartiles similar to Pandas

2017-06-14 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-21100: -- Summary: describe should give quartiles similar to Pandas Key: SPARK-21100 URL: https://issues.apache.org/jira/browse/SPARK-21100 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-20839) Incorrect Dynamic PageRank calculation

2017-06-14 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ray resolved SPARK-20839. Resolution: Not A Problem > Incorrect Dynamic PageRank calculation >

[jira] [Commented] (SPARK-20839) Incorrect Dynamic PageRank calculation

2017-06-14 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049214#comment-16049214 ] Andrew Ray commented on SPARK-20839: 1 & 2 work together to do the algorithm properly with an active

[jira] [Created] (SPARK-20769) Incorrect documentation for using Jupyter notebook

2017-05-16 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-20769: -- Summary: Incorrect documentation for using Jupyter notebook Key: SPARK-20769 URL: https://issues.apache.org/jira/browse/SPARK-20769 Project: Spark Issue Type:

[jira] [Commented] (SPARK-20429) [GRAPHX] Strange results for personalized pagerank if node is involved in a cycle

2017-05-01 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991099#comment-15991099 ] Andrew Ray commented on SPARK-20429: Can you retest your example with Spark 2.2/master. SPARK-18847

[jira] [Resolved] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException

2017-03-23 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ray resolved SPARK-19136. Resolution: Not A Bug > Aggregator with case class as output type fails with ClassCastException >

[jira] [Commented] (SPARK-16683) Group by does not work after multiple joins of the same dataframe

2017-01-20 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831878#comment-15831878 ] Andrew Ray commented on SPARK-16683: I'm working on a solution for this > Group by does not work

[jira] [Commented] (SPARK-18568) vertex attributes in the edge triplet not getting updated in super steps for Pregel API

2017-01-13 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822397#comment-15822397 ] Andrew Ray commented on SPARK-18568: RDD's have the same problem for cached collections of mutable

[jira] [Commented] (SPARK-19116) LogicalPlan.statistics.sizeInBytes wrong for trivial parquet file

2017-01-13 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821914#comment-15821914 ] Andrew Ray commented on SPARK-19116: The 2318 number is the size of the parquet files written to disk

[jira] [Commented] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException

2017-01-13 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821868#comment-15821868 ] Andrew Ray commented on SPARK-19136: I forgot you can also just do: {code}

[jira] [Commented] (SPARK-8853) FPGrowth is not Java-Friendly

2017-01-10 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816049#comment-15816049 ] Andrew Ray commented on SPARK-8853: --- But there is no reason to directly create a {{FPGrowthModel}},

[jira] [Commented] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException

2017-01-10 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815460#comment-15815460 ] Andrew Ray commented on SPARK-19136: You did not to a _typed_ aggregation so your result is a

[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases

2016-12-27 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781473#comment-15781473 ] Andrew Ray commented on SPARK-18393: It wouldn't hurt to backport to 2.0, its a pretty simple fix. >

[jira] [Commented] (SPARK-18847) PageRank gives incorrect results for graphs with sinks

2016-12-13 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746395#comment-15746395 ] Andrew Ray commented on SPARK-18847: I have and have not found any relevant. I'm currently working on

[jira] [Commented] (SPARK-18845) PageRank has incorrect initialization value that leads to slow convergence

2016-12-13 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746385#comment-15746385 ] Andrew Ray commented on SPARK-18845: [~srowen] No that's a different thing just whether the result

[jira] [Created] (SPARK-18848) PageRank gives incorrect results for graphs with sinks

2016-12-13 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-18848: -- Summary: PageRank gives incorrect results for graphs with sinks Key: SPARK-18848 URL: https://issues.apache.org/jira/browse/SPARK-18848 Project: Spark Issue

[jira] [Created] (SPARK-18847) PageRank gives incorrect results for graphs with sinks

2016-12-13 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-18847: -- Summary: PageRank gives incorrect results for graphs with sinks Key: SPARK-18847 URL: https://issues.apache.org/jira/browse/SPARK-18847 Project: Spark Issue

[jira] [Created] (SPARK-18845) PageRank has incorrect initialization value that leads to slow convergence

2016-12-13 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-18845: -- Summary: PageRank has incorrect initialization value that leads to slow convergence Key: SPARK-18845 URL: https://issues.apache.org/jira/browse/SPARK-18845 Project:

[jira] [Commented] (SPARK-17859) persist should not impede with spark's ability to perform a broadcast join.

2016-12-08 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733247#comment-15733247 ] Andrew Ray commented on SPARK-17859: this appears to be fixed in 2.0.2 {code} scala>

[jira] [Commented] (SPARK-18717) Datasets - crash (compile exception) when mapping to immutable scala map

2016-12-05 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723527#comment-15723527 ] Andrew Ray commented on SPARK-18717: I have a fix for this, will make a PR in a bit > Datasets -

[jira] [Commented] (SPARK-18717) Datasets - crash (compile exception) when mapping to immutable scala map

2016-12-05 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723499#comment-15723499 ] Andrew Ray commented on SPARK-18717: Use `scala.collection.Map` as the type in your case class

[jira] [Commented] (SPARK-11705) Eliminate unnecessary Cartesian Join

2016-12-02 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715675#comment-15715675 ] Andrew Ray commented on SPARK-11705: Above example does not have a cartesian product in Spark 2.0.2

[jira] [Commented] (SPARK-17896) Dataset groupByKey + reduceGroups fails with codegen-related exception

2016-11-28 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15703320#comment-15703320 ] Andrew Ray commented on SPARK-17896: The given code seems to work in 2.0.2 > Dataset groupByKey +

[jira] [Created] (SPARK-18457) ORC and other columnar formats using HiveShim read all columns when doing a simple count

2016-11-15 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-18457: -- Summary: ORC and other columnar formats using HiveShim read all columns when doing a simple count Key: SPARK-18457 URL: https://issues.apache.org/jira/browse/SPARK-18457

[jira] [Commented] (SPARK-17458) Alias specified for aggregates in a pivot are not honored

2016-09-15 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494591#comment-15494591 ] Andrew Ray commented on SPARK-17458: [~hvanhovell]: My JIRA username is a1ray. > Alias specified for

[jira] [Issue Comment Deleted] (SPARK-17458) Alias specified for aggregates in a pivot are not honored

2016-09-15 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ray updated SPARK-17458: --- Comment: was deleted (was: [~hvanhovell] It's a1ray) > Alias specified for aggregates in a pivot

[jira] [Comment Edited] (SPARK-17458) Alias specified for aggregates in a pivot are not honored

2016-09-15 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494361#comment-15494361 ] Andrew Ray edited comment on SPARK-17458 at 9/15/16 8:09 PM: - [~hvanhovell]

[jira] [Commented] (SPARK-17458) Alias specified for aggregates in a pivot are not honored

2016-09-15 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494361#comment-15494361 ] Andrew Ray commented on SPARK-17458: It's a1ray > Alias specified for aggregates in a pivot are not

[jira] [Created] (SPARK-13749) Faster pivot implementation for many distinct values with two phase aggregation

2016-03-08 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-13749: -- Summary: Faster pivot implementation for many distinct values with two phase aggregation Key: SPARK-13749 URL: https://issues.apache.org/jira/browse/SPARK-13749 Project:

[jira] [Commented] (SPARK-12911) Cacheing a dataframe causes array comparisons to fail (in filter / where) after 1.6

2016-01-20 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109269#comment-15109269 ] Andrew Ray commented on SPARK-12911: In the current master this happens even without caching. The

[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse

2015-12-16 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060208#comment-15060208 ] Andrew Ray commented on SPARK-9042: --- Sean, I think there are a couple issues going on here. In my

[jira] [Created] (SPARK-12205) Pivot fails Analysis when aggregate is UnresolvedFunction

2015-12-08 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-12205: -- Summary: Pivot fails Analysis when aggregate is UnresolvedFunction Key: SPARK-12205 URL: https://issues.apache.org/jira/browse/SPARK-12205 Project: Spark Issue

[jira] [Created] (SPARK-12211) Incorrect version number in graphx doc for migration from 1.1

2015-12-08 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-12211: -- Summary: Incorrect version number in graphx doc for migration from 1.1 Key: SPARK-12211 URL: https://issues.apache.org/jira/browse/SPARK-12211 Project: Spark

[jira] [Created] (SPARK-12184) Make python api doc for pivot consistant with scala doc

2015-12-07 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-12184: -- Summary: Make python api doc for pivot consistant with scala doc Key: SPARK-12184 URL: https://issues.apache.org/jira/browse/SPARK-12184 Project: Spark Issue

[jira] [Created] (SPARK-11690) Add pivot to python api

2015-11-11 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-11690: -- Summary: Add pivot to python api Key: SPARK-11690 URL: https://issues.apache.org/jira/browse/SPARK-11690 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-11275) [SQL] Regression in rollup/cube

2015-10-29 Thread Andrew Ray (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981338#comment-14981338 ] Andrew Ray commented on SPARK-11275: I think that I understand what is happening here. Any expression

[jira] [Created] (SPARK-8718) Improve EdgePartition2D for non perfect square number of partitions

2015-06-29 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-8718: - Summary: Improve EdgePartition2D for non perfect square number of partitions Key: SPARK-8718 URL: https://issues.apache.org/jira/browse/SPARK-8718 Project: Spark

[jira] [Created] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2015-01-08 Thread Andrew Ray (JIRA)
Andrew Ray created SPARK-5159: - Summary: Thrift server does not respect hive.server2.enable.doAs=true Key: SPARK-5159 URL: https://issues.apache.org/jira/browse/SPARK-5159 Project: Spark Issue