[jira] [Created] (SPARK-9978) Window functions require partitionBy to work as expected

2015-08-14 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-9978: - Summary: Window functions require partitionBy to work as expected Key: SPARK-9978 URL: https://issues.apache.org/jira/browse/SPARK-9978 Project: Spark

[jira] [Updated] (SPARK-9978) Window functions require partitionBy to work as expected

2015-08-14 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-9978: -- Description: I am trying to reproduce following SQL query: {code}

[jira] [Created] (SPARK-9098) Inconsistent Dense Vectors hashing between PySpark and Scala

2015-07-16 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-9098: - Summary: Inconsistent Dense Vectors hashing between PySpark and Scala Key: SPARK-9098 URL: https://issues.apache.org/jira/browse/SPARK-9098 Project: Spark

[jira] [Commented] (SPARK-11167) Incorrect type resolution on heterogeneous data structures

2015-10-23 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970839#comment-14970839 ] Maciej Szymkiewicz commented on SPARK-11167: spark-csv has a much simpler job to do and

[jira] [Created] (SPARK-11283) List column gets additional level of nesting when converted to Spark DataFrame

2015-10-23 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-11283: -- Summary: List column gets additional level of nesting when converted to Spark DataFrame Key: SPARK-11283 URL: https://issues.apache.org/jira/browse/SPARK-11283

[jira] [Commented] (SPARK-11167) Incorrect type resolution on heterogeneous data structures

2015-10-23 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970922#comment-14970922 ] Maciej Szymkiewicz commented on SPARK-11167: Related problem:

[jira] [Created] (SPARK-11281) Issue with creating and collecting DataFrame using environments

2015-10-23 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-11281: -- Summary: Issue with creating and collecting DataFrame using environments Key: SPARK-11281 URL: https://issues.apache.org/jira/browse/SPARK-11281

[jira] [Issue Comment Deleted] (SPARK-11167) Incorrect type resolution on heterogeneous data structures

2015-10-23 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-11167: --- Comment: was deleted (was: Related problem:

[jira] [Commented] (SPARK-11569) StringIndexer transform fails when column contains nulls

2015-11-08 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995580#comment-14995580 ] Maciej Szymkiewicz commented on SPARK-11569: It looks this problem affects Scala after all:

[jira] [Commented] (SPARK-11530) Return eigenvalues with PCA model

2015-11-08 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995874#comment-14995874 ] Maciej Szymkiewicz commented on SPARK-11530: It should actually target MLlib, shouldn't it?

[jira] [Commented] (SPARK-11281) Issue with creating and collecting DataFrame using environments

2015-11-16 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007038#comment-15007038 ] Maciej Szymkiewicz commented on SPARK-11281: [~shivaram] I've tested both current master and

[jira] [Commented] (SPARK-11281) Issue with creating and collecting DataFrame using environments

2015-11-16 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006960#comment-15006960 ] Maciej Szymkiewicz commented on SPARK-11281: [~shivaram] No, there isn't. I removed this one

[jira] [Issue Comment Deleted] (SPARK-11281) Issue with creating and collecting DataFrame using environments

2015-11-16 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-11281: --- Comment: was deleted (was: [~sunrui], [~shivaram] I don't think it is resolved by

[jira] [Commented] (SPARK-11086) createDataFrame should dropFactor column-wise not cell-wise

2015-11-15 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006292#comment-15006292 ] Maciej Szymkiewicz commented on SPARK-11086: [~shivaram] Does it resolve [SPARK-8277] as

[jira] [Created] (SPARK-11569) StringIndexer transform fails when column contains nulls

2015-11-07 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-11569: -- Summary: StringIndexer transform fails when column contains nulls Key: SPARK-11569 URL: https://issues.apache.org/jira/browse/SPARK-11569 Project: Spark

[jira] [Created] (SPARK-11167) Incorrect type resolution on heterogeneous data structures

2015-10-17 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-11167: -- Summary: Incorrect type resolution on heterogeneous data structures Key: SPARK-11167 URL: https://issues.apache.org/jira/browse/SPARK-11167 Project: Spark

[jira] [Updated] (SPARK-10973) __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry.

2015-10-07 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-10973: --- External issue URL: https://github.com/apache/spark/pull/9009 > __gettitem__ method

[jira] [Created] (SPARK-10973) __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry.

2015-10-07 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-10973: -- Summary: __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry. Key: SPARK-10973 URL:

[jira] [Updated] (SPARK-10973) __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry.

2015-10-07 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-10973: --- External issue URL: (was: https://github.com/apache/spark/pull/9009) >

[jira] [Created] (SPARK-11084) SparseVector.__getitem__ should check if value can be non-zero before executing searchsorted

2015-10-13 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-11084: -- Summary: SparseVector.__getitem__ should check if value can be non-zero before executing searchsorted Key: SPARK-11084 URL:

[jira] [Created] (SPARK-11086) createDataFrame should dropFactor column-wise not cell-wise

2015-10-13 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-11086: -- Summary: createDataFrame should dropFactor column-wise not cell-wise Key: SPARK-11086 URL: https://issues.apache.org/jira/browse/SPARK-11086 Project:

[jira] [Updated] (SPARK-10467) Vector is converted to tuple when extracted from Row using __getitem__

2015-09-06 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-10467: --- Description: If we take a row from a data frame and try to extract vector element by

[jira] [Updated] (SPARK-10467) Vector is converted to tuple when extracted from Row using __getitem__

2015-09-06 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-10467: --- Description: If we take a row from a data frame and try to extract vector element by

[jira] [Created] (SPARK-10467) Vector is converted to tuple when extracted from Row using __getitem__

2015-09-06 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-10467: -- Summary: Vector is converted to tuple when extracted from Row using __getitem__ Key: SPARK-10467 URL: https://issues.apache.org/jira/browse/SPARK-10467

[jira] [Updated] (SPARK-10467) Vector is converted to tuple when extracted from Row using __getitem__

2015-09-06 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-10467: --- Description: If we take a row from a data frame and try to extract vector element by

[jira] [Updated] (SPARK-10467) Vector is converted to tuple when extracted from Row using __getitem__

2015-09-06 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-10467: --- Description: {code} from pyspark.ml.feature import HashingTF df =

[jira] [Issue Comment Deleted] (SPARK-7683) Confusing behavior of fold function of RDD in pyspark

2016-01-02 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-7683: -- Comment: was deleted (was: [~srowen] Do you have any example how it could break

[jira] [Commented] (SPARK-7683) Confusing behavior of fold function of RDD in pyspark

2016-01-02 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15076650#comment-15076650 ] Maciej Szymkiewicz commented on SPARK-7683: --- [~srowen] Do you have any example how it could

[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074268#comment-15074268 ] Maciej Szymkiewicz commented on SPARK-6459: --- [~marmbrus] Isn't this warning obsolete in 1.5+? >

[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074322#comment-15074322 ] Maciej Szymkiewicz commented on SPARK-6459: --- I've been trying to reproduce the problem on 1.5.2

[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074342#comment-15074342 ] Maciej Szymkiewicz commented on SPARK-6459: --- Thanks for clarification. > Warn when Column API

[jira] [Created] (SPARK-12595) fold should pass arguments to op in the correct order

2016-01-01 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-12595: -- Summary: fold should pass arguments to op in the correct order Key: SPARK-12595 URL: https://issues.apache.org/jira/browse/SPARK-12595 Project: Spark

[jira] [Updated] (SPARK-12006) GaussianMixture.train crashes if an itnital model is not None

2015-11-25 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-12006: --- Description: Steps to reproduce : {code} from pyspark.mllib.clustering import

[jira] [Created] (SPARK-12006) GaussianMixture.train crashes if an itnital model is not None

2015-11-25 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-12006: -- Summary: GaussianMixture.train crashes if an itnital model is not None Key: SPARK-12006 URL: https://issues.apache.org/jira/browse/SPARK-12006 Project:

[jira] [Commented] (SPARK-9137) Unified label verification for Classifier

2015-11-26 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029170#comment-15029170 ] Maciej Szymkiewicz commented on SPARK-9137: --- [~josephkb] Could you take a look at [this question

[jira] [Created] (SPARK-15559) TopicAndPartition should provide __hash__ method

2016-05-26 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-15559: -- Summary: TopicAndPartition should provide __hash__ method Key: SPARK-15559 URL: https://issues.apache.org/jira/browse/SPARK-15559 Project: Spark

[jira] [Updated] (SPARK-15559) TopicAndPartition should provide __hash__ method

2016-05-26 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-15559: --- Description: In Python 3.x any object that provides eq method requires hash method

[jira] [Updated] (SPARK-15559) TopicAndPartition should provide __hash__ method

2016-05-26 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-15559: --- Description: In Python 3.x any object that provides `__eq__` method requires

[jira] [Updated] (SPARK-15559) TopicAndPartition should provide __hash__ method

2016-05-26 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-15559: --- Description: In Python 3.x any object that provides {{__eq__}} method requires

[jira] [Commented] (SPARK-12824) Failure to maintain consistent RDD references in pyspark

2016-01-14 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098738#comment-15098738 ] Maciej Szymkiewicz commented on SPARK-12824: ??It seems that all the keys in the dictionary

[jira] [Comment Edited] (SPARK-12824) Failure to maintain consistent RDD references in pyspark

2016-01-14 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098738#comment-15098738 ] Maciej Szymkiewicz edited comment on SPARK-12824 at 1/14/16 7:51 PM: -

[jira] [Comment Edited] (SPARK-12824) Failure to maintain consistent RDD references in pyspark

2016-01-14 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098738#comment-15098738 ] Maciej Szymkiewicz edited comment on SPARK-12824 at 1/14/16 7:55 PM: -

[jira] [Comment Edited] (SPARK-12824) Failure to maintain consistent RDD references in pyspark

2016-01-14 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098738#comment-15098738 ] Maciej Szymkiewicz edited comment on SPARK-12824 at 1/14/16 7:56 PM: -

[jira] [Created] (SPARK-14058) Incorrect docstring in Window.orderBy

2016-03-22 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-14058: -- Summary: Incorrect docstring in Window.orderBy Key: SPARK-14058 URL: https://issues.apache.org/jira/browse/SPARK-14058 Project: Spark Issue

[jira] [Commented] (SPARK-12916) Support Row.fromSeq and Row.toSeq methods in pyspark

2016-03-22 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206030#comment-15206030 ] Maciej Szymkiewicz commented on SPARK-12916: Since PySpark `Row` is just a subclass of

[jira] [Created] (SPARK-14202) python_full_outer_join should use generator expression instead of list comp

2016-03-28 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-14202: -- Summary: python_full_outer_join should use generator expression instead of list comp Key: SPARK-14202 URL: https://issues.apache.org/jira/browse/SPARK-14202

[jira] [Updated] (SPARK-14202) python_full_outer_join should use generator expression instead of list comp

2016-03-28 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-14202: --- Affects Version/s: (was: 1.3.0) > python_full_outer_join should use generator

[jira] [Commented] (SPARK-14739) Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with no indices

2016-04-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249053#comment-15249053 ] Maciej Szymkiewicz commented on SPARK-14739: Sure, but your latest PR still doesn't resolve

[jira] [Comment Edited] (SPARK-14739) Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with no indices

2016-04-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249053#comment-15249053 ] Maciej Szymkiewicz edited comment on SPARK-14739 at 4/20/16 12:47 AM:

[jira] [Created] (SPARK-14739) Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with no indices

2016-04-19 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-14739: -- Summary: Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with no indices Key: SPARK-14739 URL:

[jira] [Commented] (SPARK-14739) Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with no indices

2016-04-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249061#comment-15249061 ] Maciej Szymkiewicz commented on SPARK-14739: I extracted relevant test fixes and made PR

[jira] [Commented] (SPARK-14739) Vectors.parse doesn't handle dense vectors of size 0 and sparse vectros with no indices

2016-04-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248994#comment-15248994 ] Maciej Szymkiewicz commented on SPARK-14739: This solves only small part of the problem.

[jira] [Commented] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-07-24 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391108#comment-15391108 ] Maciej Szymkiewicz commented on SPARK-16589: [~holdenk] Makes sense. I was thinking more

[jira] [Commented] (SPARK-14155) Hide UserDefinedType in Spark 2.0

2016-07-31 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401156#comment-15401156 ] Maciej Szymkiewicz commented on SPARK-14155: [~rxin] Is there any progress on that or some

[jira] [Commented] (SPARK-12157) Support numpy types as return values of Python UDFs

2016-07-31 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401153#comment-15401153 ] Maciej Szymkiewicz commented on SPARK-12157: [~nchammas]You're using incorrect schema.

[jira] [Commented] (SPARK-12157) Support numpy types as return values of Python UDFs

2016-07-31 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401190#comment-15401190 ] Maciej Szymkiewicz commented on SPARK-12157: Well, it is alpha component (see Scala API docs

[jira] [Created] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow

2016-08-11 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-17027: -- Summary: PolynomialExpansion.choose is prone to integer overflow Key: SPARK-17027 URL: https://issues.apache.org/jira/browse/SPARK-17027 Project: Spark

[jira] [Comment Edited] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow

2016-08-11 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418071#comment-15418071 ] Maciej Szymkiewicz edited comment on SPARK-17027 at 8/11/16 10:38 PM:

[jira] [Commented] (SPARK-17027) PolynomialExpansion.choose is prone to integer overflow

2016-08-12 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418071#comment-15418071 ] Maciej Szymkiewicz commented on SPARK-17027: Yes, this exactly the problem. {code}

[jira] [Commented] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-07-18 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382614#comment-15382614 ] Maciej Szymkiewicz commented on SPARK-16589: [~dongjoon] I'll work on that but I am not

[jira] [Created] (SPARK-16626) Code duplication after SPARK-14906

2016-07-19 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-16626: -- Summary: Code duplication after SPARK-14906 Key: SPARK-16626 URL: https://issues.apache.org/jira/browse/SPARK-16626 Project: Spark Issue Type:

[jira] [Commented] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-07-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384235#comment-15384235 ] Maciej Szymkiewicz commented on SPARK-16589: Thanks [~dongjoon]. [~joshrosen] Could you

[jira] [Updated] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-07-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-16589: --- Description: Chaining cartesian calls in PySpark results in the number of records

[jira] [Updated] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-07-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-16589: --- Affects Version/s: 1.4.0 1.5.0 > Chained cartesian produces

[jira] [Created] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-07-16 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-16589: -- Summary: Chained cartesian produces incorrect number of records Key: SPARK-16589 URL: https://issues.apache.org/jira/browse/SPARK-16589 Project: Spark

[jira] [Updated] (SPARK-19453) Correct Column.replace docs

2017-02-03 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-19453: --- Summary: Correct Column.replace docs (was: Correct ) > Correct Column.replace docs

[jira] [Created] (SPARK-19453) Correct

2017-02-03 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-19453: -- Summary: Correct Key: SPARK-19453 URL: https://issues.apache.org/jira/browse/SPARK-19453 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-19453) Correct DataFrame.replace docs

2017-02-03 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-19453: --- Summary: Correct DataFrame.replace docs (was: Correct Column.replace docs) >

[jira] [Created] (SPARK-19454) Improve DataFrame.replace API

2017-02-03 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-19454: -- Summary: Improve DataFrame.replace API Key: SPARK-19454 URL: https://issues.apache.org/jira/browse/SPARK-19454 Project: Spark Issue Type:

[jira] [Created] (SPARK-19427) UserDefinedFunction should support data types strings

2017-02-01 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-19427: -- Summary: UserDefinedFunction should support data types strings Key: SPARK-19427 URL: https://issues.apache.org/jira/browse/SPARK-19427 Project: Spark

[jira] [Commented] (SPARK-13802) Fields order in Row(**kwargs) is not consistent with Schema.toInternal method

2017-02-04 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853008#comment-15853008 ] Maciej Szymkiewicz commented on SPARK-13802: [~szymonm] Realistically it is rather unlikely

[jira] [Comment Edited] (SPARK-13802) Fields order in Row(**kwargs) is not consistent with Schema.toInternal method

2017-02-04 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853008#comment-15853008 ] Maciej Szymkiewicz edited comment on SPARK-13802 at 2/5/17 12:45 AM: -

[jira] [Updated] (SPARK-19162) UserDefinedFunction constructor should verify that func is callable

2017-02-08 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-19162: --- Priority: Minor (was: Major) > UserDefinedFunction constructor should verify that

[jira] [Updated] (SPARK-19161) Improving UDF Docstrings

2017-02-08 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-19161: --- Priority: Minor (was: Major) > Improving UDF Docstrings >

[jira] [Updated] (SPARK-19161) Improving UDF Docstrings

2017-02-08 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-19161: --- Priority: Major (was: Minor) > Improving UDF Docstrings >

[jira] [Updated] (SPARK-19165) UserDefinedFunction should verify call arguments and provide readable exception in case of mismatch

2017-02-08 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-19165: --- Priority: Minor (was: Major) > UserDefinedFunction should verify call arguments and

[jira] [Created] (SPARK-19506) Missing warnings import in pyspark.ml.util

2017-02-07 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-19506: -- Summary: Missing warnings import in pyspark.ml.util Key: SPARK-19506 URL: https://issues.apache.org/jira/browse/SPARK-19506 Project: Spark Issue

[jira] [Created] (SPARK-19475) (ML|MLlib).linalg.DenseVector method delegation fails for __neg__

2017-02-06 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-19475: -- Summary: (ML|MLlib).linalg.DenseVector method delegation fails for __neg__ Key: SPARK-19475 URL: https://issues.apache.org/jira/browse/SPARK-19475

[jira] [Created] (SPARK-19467) PySpark ML shouldn't use circular imports

2017-02-05 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-19467: -- Summary: PySpark ML shouldn't use circular imports Key: SPARK-19467 URL: https://issues.apache.org/jira/browse/SPARK-19467 Project: Spark Issue

[jira] [Created] (SPARK-19429) Column.__getitem__ should support slice arguments

2017-02-01 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-19429: -- Summary: Column.__getitem__ should support slice arguments Key: SPARK-19429 URL: https://issues.apache.org/jira/browse/SPARK-19429 Project: Spark

[jira] [Commented] (SPARK-12157) Support numpy types as return values of Python UDFs

2017-01-21 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832907#comment-15832907 ] Maciej Szymkiewicz commented on SPARK-12157: I've been looking at this in context of

[jira] (SPARK-19403) pyspark.sql.column exports non-existent names

2017-01-30 Thread Maciej Szymkiewicz (JIRA)
Title: Message Title Maciej Szymkiewicz created an issue

[jira] (SPARK-15559) TopicAndPartition should provide __hash__ method

2017-01-30 Thread Maciej Szymkiewicz (JIRA)
Title: Message Title Maciej Szymkiewicz closed an issue as Duplicate

[jira] (SPARK-15559) TopicAndPartition should provide __hash__ method

2017-01-30 Thread Maciej Szymkiewicz (JIRA)
Title: Message Title Maciej Szymkiewicz commented on SPARK-15559

[jira] (SPARK-13802) Fields order in Row(**kwargs) is not consistent with Schema.toInternal method

2017-01-30 Thread Maciej Szymkiewicz (JIRA)
Title: Message Title Maciej Szymkiewicz commented on SPARK-13802

[jira] [Comment Edited] (SPARK-16931) PySpark access to data-frame bucketing api

2017-02-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873710#comment-15873710 ] Maciej Szymkiewicz edited comment on SPARK-16931 at 2/19/17 2:09 PM: -

[jira] [Reopened] (SPARK-16931) PySpark access to data-frame bucketing api

2017-02-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reopened SPARK-16931: Should be implemented to achieve feature parity. > PySpark access to data-frame

[jira] [Commented] (SPARK-16931) PySpark access to data-frame bucketing api

2017-02-19 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873709#comment-15873709 ] Maciej Szymkiewicz commented on SPARK-16931: Thanks [~sowen]. I'll reopen this and if there

[jira] [Created] (SPARK-19728) PythonUDF with multiple parents shouldn't be pushed down when used as a predicat

2017-02-24 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-19728: -- Summary: PythonUDF with multiple parents shouldn't be pushed down when used as a predicat Key: SPARK-19728 URL: https://issues.apache.org/jira/browse/SPARK-19728

[jira] [Updated] (SPARK-19728) PythonUDF with multiple parents shouldn't be pushed down when used as a predicate

2017-02-24 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-19728: --- Summary: PythonUDF with multiple parents shouldn't be pushed down when used as a

[jira] [Commented] (SPARK-16931) PySpark access to data-frame bucketing api

2017-02-18 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873329#comment-15873329 ] Maciej Szymkiewicz commented on SPARK-16931: [~sowen] Is there any particular reason for

[jira] [Resolved] (SPARK-19163) Lazy creation of the _judf

2017-02-14 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-19163. Resolution: Fixed Fix Version/s: 2.2.0 > Lazy creation of the _judf >

[jira] [Commented] (SPARK-19163) Lazy creation of the _judf

2017-02-13 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864110#comment-15864110 ] Maciej Szymkiewicz commented on SPARK-19163: [~holdenk] I see you've sorted out Jira

[jira] [Updated] (SPARK-19224) [PYSPARK] Python tests organization

2017-01-16 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-19224: --- Target Version/s: (was: 2.2.0) Fix Version/s: (was: 2.2.0) > [PYSPARK]

[jira] [Commented] (SPARK-13802) Fields order in Row(**kwargs) is not consistent with Schema.toInternal method

2017-02-26 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884939#comment-15884939 ] Maciej Szymkiewicz commented on SPARK-13802: [~szymonm] Do you have anything particular in

[jira] [Commented] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-10-04 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545239#comment-15545239 ] Maciej Szymkiewicz commented on SPARK-16589: Not actively so if you want to give it a shot go

[jira] [Commented] (SPARK-17587) SparseVector __getitem__ should follow __getitem__ contract

2016-10-04 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545262#comment-15545262 ] Maciej Szymkiewicz commented on SPARK-17587: I would probably go with 2.1.0 alone and

[jira] [Commented] (SPARK-16626) Code duplication after SPARK-14906

2016-10-07 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554978#comment-15554978 ] Maciej Szymkiewicz commented on SPARK-16626: Oh well. Let's mark it a won't fix. No reason to

[jira] [Created] (SPARK-17587) SparseVector __getitem__ should follow __getitem__ contract

2016-09-18 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-17587: -- Summary: SparseVector __getitem__ should follow __getitem__ contract Key: SPARK-17587 URL: https://issues.apache.org/jira/browse/SPARK-17587 Project:

[jira] [Created] (SPARK-17756) java.lang.ClassCastException when using cartesian with DStream.transform

2016-10-01 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-17756: -- Summary: java.lang.ClassCastException when using cartesian with DStream.transform Key: SPARK-17756 URL: https://issues.apache.org/jira/browse/SPARK-17756

  1   2   3   4   5   6   7   >